Singapore, Nov 25, 2009: In 1991, Mr Stephen Fodor, Mr Lubert Stryer and other researcherspublished an article describing light-directed, spatially addressable parallel chemical synthesis— a process for synthesizing a desired set of peptides directly onto a small glass slide through repeated rounds of light-directed chemical coupling of amino acids. The adaptation of this process for synthesizing thousands of oligonucleotides of unique sequence onto a glass slide eventually lead to the invention and commercialization of high-throughput DNA microarray technology in the early 1990s. This revolutionary technology allowed scientists to perform global transcriptional profiling experiments that involve simultaneously measuring the expression level of thousands of genes in multiple samples representing various experimental conditions.
From this high-dimensional dataset, genes that are differentially expressed between experimental conditions are identified, with the underlying assumption that the observed phenotypic differences may be explained by the identified set of genes. With the completion of the sequencing of the human genome, it was widely speculated that the ability to simultaneously measure the transcription levels of all putative genes would alter the pace at which we gain knowledge of the underlying mechanisms of various biological processes and human diseases.
Microarray technology soon expanded its application to profiling other biological entities such as single nucleotide polymorphism (SNP), genomic copy number, microRNAs, and transcription factor binding sites. The predominant trend in microarray-based studies has been unilateral profiling of each of these biological entities. Very few studies attempt to interpret profiling data of one type of biological entity within the context of another.
However, declining number of new Food and Drug Administration (FDA) approved biomarkers conditionsreported over the last decade suggests that an understanding of human disease cannot be achieved by studying these entities separately.
The molecular mechanisms that drive a particular biological or disease process are mediated by the intricate interplay of many few studies attempt to interpret profiling data of one type of biological entity within the context of another.
However, declining number of new Food and Drug Administration (FDA) approved biomarkers reported over the last decade suggests that an understanding of human disease cannot be achieved by studying these entities separately.
The molecular mechanisms that drive a particular biological or disease process are mediated by the intricate interplay of many biological entities such as genes, mRNAs, proteins, and metabolites.
While transcriptional profiling experiments have yielded great knowledge and insight, a comprehensive understanding of a biological system cannot be achieved by mRNA profiling alone. This notion is illustrated by the simple fact that proteins, not mRNAs, are the functional units of the cell. Proteins are enzymes that catalyze biochemical reactions key to cell metabolism, structural components that help maintain cell shape, and interacting biological entities that relay information through signaling pathways.
Thus, it can be argued that ascertaining differences in protein expression between experimental conditions may be a more direct path to understand a biological system. Despite the wealth of knowledge expected to be gained from global profiling of the proteome, the number of proteomics profiling experiments performed is low relative to transcriptional profiling experiments. This can be attributed to the fact that proteomics profiling experiments are more costly, technically challenging, and have lower throughput. However, with recent advances in technologies such as antibody-based microarray and substantial improvements in liquid chromatography mass spectrometry (LC-MS) platforms, global profiling of proteins has become one of the fastest growing areas of research.
It is clear that unilateral profiling of mRNAs or proteins would produce incomplete and even misleading interpretations of the biological system. Although the expression level of many proteins is controlled at the transcriptional level, post-transcriptional regulation processes that affect translation initiation or protein stability can also affect levels of protein expression. Thus, mRNA expression analysis will undoubtedly identify changes that are not reflected at the protein level, and therefore, may not have any biological consequences.
Integrative analysis of transcriptional and proteomics profiling data would allow the identification of genes that are regulated at the transcription level as well as post-transcriptional regulatory mechanisms that are important for the biological or disease process under study. Thus, convergence of transcriptional and proteomics profiling is necessary to gain a comprehensive understanding of the underlying molecular network of a biological system.
Despite the prevalence of high-throughput technology available for profiling studies, there are surprisingly few scientific publications that integrate heterogeneous data such as mRNA and protein expression to produce a more comprehensive understanding of the biology system. The large amount of transcriptional microarray profiling experiments and the growing number of protein profiling experiments’ results available in various public databases suggest that the ability to generate the complex global data is not the bottleneck to integrative data analysis.
Instead, lack of bioinformatics solutions that allow researchers to identify the linkages and concordance between mRNA and protein expression has been suggested to be a key impediment to integrative data analysis. Unlike gene expression microarray, where the identity of the mRNAs being measured are often known, mass spectrometry-based proteomics profiling experiments generate peptide fragment patterns with unknown sequence. Protein identification thus relies on informatics software applications to correlate fragment patterns of interest with known peptide fragment patterns in various databases.
Once proteins of interest are identified, correlation of proteomics data to transcription data can then be made. Currently, analysis of different types of data is performed in separate software applications, with each application containing analytical and visualization tools optimized for the specific data type. However, this practice can lead to a decreased power to detect any concordance that may exist between the datasets. For example, different statistical tests may be used in the applications, resulting in an artificially low overlap between the genes and proteins that are differentially expressed in the two datasets. Faulty semantic mapping of biological information between different software applications can also contribute to a reduced concordance.
Combining tools for analysis and visualization of heterogeneous data into a single software application will greatly alleviate the challenges involved in integrative data analysis.
Ms Pam Tangvoranuntakul is the Bioinformatics Product Manager at Agilent Technologies. She received her PhD in biomedical sciences from the University of California, San Diego. Her current role focuses on building bioinformatics solutions to address the challenges of multi-omics data analysis.
|