Assembling the Complete Protein Collection The Serono Secretome

Human genome sequencing opened up an even more complete view of the rich variety of human secreted proteins - the majority of which had not been studied in depth. The number of secreted proteins coded by the human genome has been a matter of extensive debate -with ranges of up to 8,000 (Kramer and Cohen 2004), many of which will not be druggable as defined by Hopkins (Hopkins and Groom 2002). We therefore set out to assemble a collection of 2,000-3,000 secreted proteins - our requirement being that any protein in our collection should be purified, and be characterized by SDS PAGE. The proteins are produced in transient transfections in human cells, which ensures that their glycosylation patterns resemble the natural pattern as much as possible. Typical expression levels of the most highly expressed proteins are 500 |g and are sufficient to determine an early biological activity in vitro. In order to assemble such a collection, we followed a variety of different approaches (see Fig. 17.4).

1. Signal sequence trapping. Early work to identify secreted proteins focused on the construction of cDNA libraries that could be enriched for secreted proteins using a technique called signal sequence trapping (Tashiro et al. 1993). The alternative approach to biological sequence tag selection was to use bioinformatics approaches to recognize signal sequences. This was the focus of the Genset Signal Tag collections first produced in 1997 (W0200037491). These sequences (almost 1,000 of them) have been expressed and purified, allowing us to then test these proteins in parallel in high content cell biology assays to find new biological activities.

Fig. 17.4. The secretome. Our collection of secreted proteins comes from a variety of sources - from early database searching using signal sequences, through more and more complex bioinformatics sources. It is also clear that there are interesting proteins yet to be found, even morethanlO years after the start of EST sequencing and 5 years after the completion of the human genome sequence

Fig. 17.4. The secretome. Our collection of secreted proteins comes from a variety of sources - from early database searching using signal sequences, through more and more complex bioinformatics sources. It is also clear that there are interesting proteins yet to be found, even morethanlO years after the start of EST sequencing and 5 years after the completion of the human genome sequence

2. Multiple alignment searches in families of proteins. It is clear that the majority of cytokines have similar three-dimensional folds. The chemokines mentioned above have a three-stranded beta sheet and a carboxy terminal alpha helix. Most of the interleu-kins and growth factors are built from a bundle of four alpha helices. Doing a BLAST search with the simple sequence of the protein does not always find other members of the same structural class: inter-leukin (IL)-2 will not find IL-3, and IL-4 will not find IL-5 despite them all being four helix bundles (see Wells et al. 1994). Thus, more sensitive approaches have used multiple sequence alignment approaches, where profiles are built up of the average of the members of a family and these are used to search (programs such as PSI BLAST). Zymogenetics identified a whole range of post-genomic cytokines, including many new TNF family members and four helix bundles (Foster et al. 2004). These include IL-20, IL-22, IL-24 and IL-31 and most importantly the B-cell modulator TACI (transmembrane activator and CAML inhibitor; Gross et al. 2000). A fusion protein, TACI-Fc, is now in clinical development for autoimmune diseases.

3. Using the three-dimensional structures of proteins -threading. More complex analysis of the proteins imbedded in the human genome was possible using threading. This is a technology where protein sequences are compared to all the known three-dimensional folds for proteins, and therefore is looking for new proteins based on shape (Miller et al. 1996). We have been collaborating with Inphar-

matica since 2001 and have found almost 200 putative new open reading frames coding for new proteins and annotations. This technology requires an extremely large computing power, if the whole of the genome is used, and its development has been possible because of the availability of massive parallel computing systems. 4. Beyond the genome - new approaches to finding new secreted proteins

Although the human genome sequence has been available now for 5 years, there are still new secreted proteins to be found. First, we know that splice variants exist in large numbers (Graveley 2001), and that some have interesting biological activities. Second, whole genome scans, where we are looking at differences in single nucleotide polymorphisms (SNPs) across large populations, are now relatively inexpensive. Using these approaches we can find SNPs linked with disease where no known protein is expressed -allowing the discovery of new disease-associated proteins. Approaches such as genomic tiling arrays (Bertone et al. 2006) can identify novel open reading frames in these areas - some of which have encoded previously unidentified secreted proteins amongst the non-coding RNAs (Washietl et al. 2005).

Putting all this together, we have been able to assemble a collection of over 2,000 secreted proteins. This compares with an earlier collection of over a thousand genes, the Secreted Protein Discovery Initiative (Clark et al. 2003). In order to be part of our secretome collection, the cDNAs were inserted into a standard vector, and we purified protein on a highly parallel automated system. In comparison with earlier approaches, we have concentrated on the quality of the protein as an entry criterion into our collection. Work with combinatorial chemistry has taught us that approaches that fail to check if the protein is actually being produced generally result in a lot of lost time following up on false positives.

Was this article helpful?

0 0
Natural Treatments For Psoriasis

Natural Treatments For Psoriasis

Do You Suffer From the Itching and Scaling of Psoriasis? Or the Chronic Agony of Psoriatic Arthritis? If so you are not ALONE! A whopping three percent of the world’s populations suffer from either condition! An incredible 56 million working hours are lost every year by psoriasis sufferers according to the National Psoriasis Foundation.

Get My Free Ebook


Post a comment