20 Jun 2013, BioSpectrum Bureau , BioSpectrum
Singapore: Scientists at A*Star's Genome Institute of Singapore (GIS) have developed a method to quickly cut through noise present in data and generate a unified and simplified analysis of high-throughput biological data from, for example, patient samples.
The technique, known as a pre-whitening matched filter, is well known in electrical engineering and widely used in cell phones and radar. This is the first time, however, computational scientists, led by Dr Shyam Prabhakar, associate director, Integrated Genomics, GIS, have adapted it to the analysis of high-throughput DNA sequencing data, with surprisingly accurate results.
High-throughput DNA sequencing has revolutionized the study of molecular biology and human disease. The technology has yielded major insights into cancer, infectious diseases, Parkinson's disease and many developmental disorders.
Dr Prabhakar and his team at the GIS, however, discovered that by using the pre-whitening matched filter technique, the results were uniformly better than other existing algorithms at a whole range of analysis tasks. In essence, the technique was applied to accurately detect segments of the genome that stood out from the rest of the sequence data. This was possible because, as lead author Dr Vibhor Kumar quickly realized, the underlying mathematics to the solution of all these analysis problems was the same.
"Our work fits into the pattern of applying engineering solutions to data analytics problems, and we are excited about using our approach to uncover important features of human disease," said Dr Prabhakar. "This discovery will make it a lot easier for scientists to make biological inferences from high-throughput DNA data, particularly in the context of clinical samples from patients."
GIS executive director Professor Ng Huck Hui said, "This is a classic work of high performance computational biology that provides an analytical solution for a complex big data era. With this development, Dr Prabhakar's team brings us one big leap further and faster in scientific high-throughput sequencing work."
Dr Rob Mitra, Alvin Goldfarb distinguished professor of computational iology and associate professor, Department of Genetics at the Washington University School of Medicine, said, "This work provides an elegant solution to a ubiquitous problem: separating the signal from the noise in deep-sequencing datasets. The DFilter algorithm represents a significant advance because it is widely applicable and because it is more accurate than existing algorithms. DFilter can be used to analyze virtually any sequence-tag analysis of DNA binding (e.g. ChIP-Seq, DNASE-Seq, or FAIRE-Seq), and since it uses the mathematically optimal linear discriminant, it was able to outperform all of the existing tools that were developed specifically for each type of assay."