The sequencing of the human genome marked the beginning of a collective scientific expedition to understand complex organisms. Genes, of course, merely contain the instructions for which proteins--the ultimate biological effector molecules--populate the cell. Untangling the multi-faceted networks that regulate complex organisms and their diseases will require innovative technologies to globally monitor gene, protein, and transcript. Transcript measurement technologies are well-developed, high throughput, and broadly accessible. Next-gen sequencers, for example, can detect and quantify up to twenty thousand messages from multiplexed samples in less than one week. This capability has had a fantastic and transformative impact on modern biology and medicine. For numerous reasons, protein analysis is considerably less evolved and markedly less accessible, such that we depend on protein measurement by proxy. Transcriptomics is no surrogate for proteomics: first, recent large-scale analyses only show marginal protein/RNA correlation (R = 0.4-0.6) and second, post-translational modifications (PTMs) are only detectable at the protein level.
Fig.1. Stem cell Phosphoproteomics.
We used quantitative proteomics to compare phosphorylation events between pluripotent cell types (embryonic stem (ES) cells and induced pluripotent stem (iPS) cells) and differentiated cells (newborn foreskin fibroblast (NFF) cells). We then applied the Group-based Prediction System (GPS) database to match differentially regulated sites to their cognate kinases. Shown to the left is a phylogenetic tree depicting kinases with increased activity in pluripotent cells (red) and NFF cells (blue). Note that various branches of the kinase tree are devoted to a particular cell type. For example, CMGC kinases show increased activity in pluripotent cells, while CAM kinases are active in differentiated cells. (From Phanstiel DH, Brumbaugh j, Wenger CD, Tian S, Probasco MD, Bailey DJ, Swaney DL, Tervo MA, Bolin JM, Ruotti V, Stewart S, Thomason JA, Coon JJ. Proteomic and phosphopreteomic comparison of human ES and iPS cells. Nature Methods, 2011.)
To confront these limitations, and more broadly, to continue the grand journey launched by the Human Genome Project, the Coon group seeks to develop next-generation protein measurement technologies and an integrated informatics platform to assimilate data with gene- and transcript-level figures. These essential technologies are developed in the context of a cadre of driving biomedical projects ranging from basic to translational – i.e., the yeast environmental stress response, the maintenance of pluripotency, and IgA nephropathy pathogenesis, among others.
AIM 1. To render the pace, depth, and reproducibility of protein and PTM quantification commensurate with transciptomic technologies. Current proteomic technologies lack throughput, sensitivity, and reproducibility, as compared to RNA-based methods. Complete coverage of specific protein pathways or functional groups, for example, is not typical (i.e., all 500 kinases, 1,400 transcription factors, etc.). Likewise, overlapping identifications in replicate experiments are low (35-60%). Lacking completeness and reproducibility limits the quantity and quality of biological conclusions that can be drawn from proteomic experiments. The coon group is developing novel technologies directed at resolving each of these fundamental limitations.
AIM 2. To develop computational methods for integration on gene, transcript, and protein-level data. Mature, broadly accessible computational methods for integration of these massive data sets that comprise multiple planes of measurement do not presently exist. We will develop such technologies and integrate them with existing publicly available information. Ultimately, we aim to eliminate the tedious, and more often overwhelming and impenetrable, task of assimilating massive datasets to allow non-expert researchers to easily and expediently derive biological meaning.