Kleensang et al (2013) Pathways of Toxicity, a report from t4 - The Transatlantic Think-Tank for Toxicology.
This quotation also referenced Kholodenko’s paper in Science Signalling, Computational Approaches for Analyzing Information Flow in Biological Networks, in which the authors write that “the life of a biologist has changed”…
Just 20 years ago, the standard modus operandi was one research team working on one protein or gene for a lifetime of research.
… Mathematical and computational modeling methods are playing major roles in both tasks, especially as the ability to generate data has outpaced our ability to interpret them. The greatest strides have been made in the first task. The second is lagging behind but is moving into the limelight as new analysis methods are developed.
… Historically, traditional hypothesis-driven experiments formed the early basis of information about signal transduction pathways by piecing together components in tedious trial-and-error type experimental approaches that assume that signaling inputs are related to outputs by a linear path of signal transduction. Two experimental mainstays that have rapidly and dramatically enhanced the ability to map the components of interaction networks are the genetic yeast two-hybrid (Y2H) system and mass spectrometry (MS)–based proteomics.
False-positive results and the absence of known protein-protein interactions (PPIs) that depend on contextual information (such as stimulus-specific phosphorylation, which may or may not occur within budding yeast) remain limitations of the method. However, as the above example shows, these shortcomings are being addressed. The abundant data that have been and continue to be accumulated are of great utility, particularly when combined with other types of interaction data. An additional limitation of the Y2H approach is that it cannot reveal dynamic changes in PPIs; thus, the resulting graphs of binary interactions do not reveal how signaling information flows, which hinders the reconstruction of directed pathways.
MS analysis of cells fractionated into defined subcellular signaling structures—such as the centrosome, the mitotic spindle, and the kinetochore, for example—has identified many of the critical protein components of these subcellular complexes. Focused isolation approaches—particularly the use of tandem affinity purification (TAP)–tagged proteins, combined with stable isotope labeling by amino acids in cell culture (SILAC)—are sensitive, specific, and accurate methods for identifying proteins that interact with particular signaling molecules. A caveat of most MS-based methods is that the experiments generally do not reveal which interactions are direct and which are mediated through an additional component or components.
Perhaps the best approach to defining network topology is the weighted collection of the available evidence for specific protein-protein interactions, including the results of both the low-throughput and high-throughput approaches. Fortunately, a number of online databases and Web-based tools that enable the construction and analysis of weighted collections exist, including BIND, BioGRID, MINT, and DIP. Some tools, such as STRING and iRefWeb, attempt to merge the information from different databases, as well as text-based searches from the literature into a single Web resource. Similarly, resources for pathways and pathway models are constantly being improved. Some examples are the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, Pathway Commons, the Science Signaling Database of Cell Signaling, the Pathway Interactome Database, and BioModels.
The scale of most of these data collections is enormous, and the amount of potentially valuable information in them is huge. However, they are complicated by the presence of misidentified peptides and proteins, incomplete representation of known network components, and an inherent bias against proteins that are present in low abundance… [None] of these approaches enables the reconstruction of network models that enable dynamic simulations.
High-throughput data usually suffer from low information content—that is, the observable results contain little information about the unknown parameters that caused them. A “local” perturbation that is initially confined to a particular network node can propagate and cause widespread “global” changes in the network and thereby mask immediate connections and routes. This issue is particularly pertinent to large omics data sets, because even in response to a single local perturbation, the omics snapshots of the cellular state arise from a plethora of interactions spreading through cell networks.
From thereon in gets quite dense but engrossing if you’re interested in understanding “systems biology”, and although encased in computational terminology many of the ideas are quite intuitive. As the paper points out - all of the life sciences are grappling with this issue of a sudden data deluge. Wonder is a matter of personal taste, but this is fascinating to me:
One of the challenges of mechanistic modeling is a combinatorial explosion in the number of emerging different species and distinct states of cellular networks that include scaffolds and proteins with several posttranslational modification sites. These multiple docking and modification sites generate a variety of heterogeneous protein complexes, and each of these complexes can be involved in many parallel reactions. Even initial steps in signal transduction that include receptors and adaptor proteins may generate hundreds of thousands and millions of distinct states, referred to as “micro-states” of a network.
… Many topological motifs enriched in transcriptional networks have also been found in RTK, mitogenic [Ras to ERK], and survival [PI3K] signaling networks. Although different underlying biochemistry results in distinct kinetic equations that describe signaling or gene networks, the control circuitry remains similar.
Having said this, some of the field’s problems are particularly gruesome:
It is difficult or impossible to compare networks composed of different types of signaling components. Protein networks are typically interaction networks or modification networks. Like transcriptional networks, they often have a time component or compare different conditions. However, it is difficult to compare transcriptional and protein networks directly, even when frequent parallel sampling is available.
The time delay between the production of mRNA and its encoded protein can vary between genes and, hence, tends to destroy protein and gene regulatory network comparisons based on temporal correlations. In addition, because high-throughput proteomics identifies proteins on the basis of isolated peptide fragments, rather than coverage of the complete protein sequence, matching proteins to splice variants easily seen in transcriptomics experiments becomes ambiguous. Metabolites are direct outputs of protein activities and, therefore, would be expected to map organically onto protein networks. However, metabolic networks model physical fluxes of metabolites, whereas protein networks model flow of information or abstract influences. These relations cannot be directly superimposed, and also, the mathematical methods for analyzing these networks are fundamentally different.
It seems that we are stranded with a treasure chest of information and many keys that do not fit.
The results from genome sequencing projects have not only destroyed the one gene–one protein–one function dogma, but also bluntly shown us that our mere 20,000 genes perform the huge diversity of biological tasks by combinatorial cooperation whose complexity simply stuns the human mind.
& lastly an interesting little fact…
Although several thousand proteins are estimated to be susceptible to a drug, currently used drugs only target ~270 human proteins, a number that has not changed in almost a decade.