By Lauren Ward

Computer algorithms—including some once used to sort out the stars—are revealing the secrets of the new “universe” unlocked by the Human Genome Project.

Leading this exciting new search are interdisciplinary teams of researchers from Carnegie Mellon University.

Because of the Human Genome Project, the computer has joined the laboratory beaker in the hunt for cures to diseases like cancer, stroke and heart disease. That shift is moving Carnegie Mellon, with its unparalleled computer science program, to the forefront of biomedical research.

That’s because researchers now need to make sense of a “universe” of 25,000 separate genes, up to 300,000 different proteins and the countless ways our cells communicate with one other.

So Carnegie Mellon scientists are designing computational programs to decipher our biology. In simplest terms, computational biology is an interdisciplinary field that discovers and uses computer programs, better known as algorithms, to understand biological processes. At Carnegie Mellon, teams of scientists have turned powerful algorithms initially used to probe the cosmos into sophisticated laboratory tools.

They’re also using algorithms developed in linguistics to unravel the mysteries of gene expression. And they’re inventing many other computational methods that should speed biomedical research and drug discovery.

“One of the major challenges and opportunities in the post-genomic era is combining expertise from computer science and biology to answer questions that could not even be envisaged just a few years ago,” said Professor Robert Murphy, Ph.D., director of the Merck Computational Biology and Chemistry Program.

Carnegie Mellon has the assets to answer these questions. A top-ranking institution in computer science, it continues to increase its portfolio of research grants, recruit talented faculty and build on a 15-year experience in educating computational biologists.

Because proteins dictate when cells divide, respond to their environment and carry out life-sustaining processes, a critical focus of computational biology is understanding how genes make proteins and how proteins interact with one another. This research area, called proteomics, will help scientists understand why disease undermines these activities.

“The proteome of every cell is made of thousands of proteins with different and distinct locations, abundances, interactions and biochemical activities. Clearly, the more completely we can describe the composition and the dynamics of the proteome, the better we will understand the biology of the cell in health and disease,” said Murphy.

Murphy has designed machine-learning methods to automatically analyze digital images of fluorescently labeled proteins inside living cells. This new science—location proteomics—is superior to the human eye in objectively locating proteins within cells over space and time. Diseases like cancer often involve mishaps in protein movement, so knowing where proteins should normally be within cells is critical to designing targeted therapies, according to Murphy, whose novel work recently garnered a multimillion-dollar federal grant.

Another technology developed by Jonathan Minden, associate professor of biological sciences, and William Eddy, professor of statistics, was inspired by astrophysics software. Combining a new lab method with sophisticated software, Difference Gel Electrophoresis (DIGE) is revolutionizing the detection of subtle protein differences between diseased cells and healthy cells. Spotting these differences will help scientists identify disease markers and drug targets. A unique tool, DIGE is widely commercialized by Amersham BiosciencesTM, a leading biotech company.

In a recent advance, Eddy and Minden have developed a completely new computational method that enables DIGE to spot differences in small-abundance proteins that could play a major role in causing cancer. Minden is using the modified DIGE to study proteins in samples taken from patients with leukemia. This should assist many other fields of biomedical image analysis, said Minden.

Identifying protein structures that interact with drugs is another way that computational biologists at Carnegie Mellon hope to advance drug discovery. Using standard technologies, researchers now take six months to a year to determine a protein’s structure. But Michael Erdmann, professor of computer science and robotics, and Gordon Rule, professor of biological sciences, have developed a tool that could reduce this process to a matter of weeks. PEPMORPH uses a high-speed computational approach to analyze limited data and generate structural “sketches” of newly discovered or previously uncharacterized proteins.

“By providing a high-throughput computational tool, our approach should reduce the need to perform costly, time-intensive studies,” said Rule.

Location proteomics, DIGE and PEPMORPH are central components of an interdisciplinary $3.5 million grant from the Commonwealth of Pennsylvania to advance cancer diagnosis and treatment.

At Carnegie Mellon, linguistics tools also show great promise for advancing our understanding of disease. Through a $9 million, multi-institutional grant from the National Science Foundation, Raj Reddy, Simon University Professor of Computer Science and Robotics, and Judith Klein-Seetharaman, assistant professor of pharmacology at the University of Pittsburgh and research scientist at Carnegie Mellon’s Language Technologies Institute (LTI), have initiated the development of computational biolinguistics. Collaborators include LTI director and computer science professor Jaime Carbonell and others at Carnegie Mellon, the Center for Computational Biology and Bioinformatics at the University of Pittsburgh, and MIT.

The work is based on the premise that the building blocks of proteins—called amino acids—form sequences that can be read to understand their structure, dynamics and function, just as in languages, where sequences of letters fall into patterns that make them understandable.

“The computational biolinguistics project promises to provide novel views and approaches to solving these challenges that would not be obvious without thinking of them in terms of an analogy between language and biology,” said Klein-Seetharaman.

Carnegie Mellon’s reputation in computational biology continues to grow, thanks to its steady recruitment of talented faculty.

At the vanguard is Dannie Durand, an associate professor of biological sciences and computer science who joined Carnegie Mellon in 2000. Durand carries out computational research to understand gene evolution.

“Carnegie Mellon is a wonderful place for computational biology because it is easy to collaborate with faculty and students in different fields, and unusual approaches are appreciated here.”

Related Links:
Robert F. Murphy
Merck Computational Biology and Chemistry Program
Mellon College of Science
Computational Biology
Computational Biolinguistics
Jonathan Minden
William Eddy
Michael Erdmann
Gordon Rule
Raj Reddy
Judith Klein-Seetharaman
Language Technologies Institute
Jaime Carbonell
Ronald Rosenfeld
Yiming Yang
Richard McCullough
Dannie Durand