Just as powerful new computers and computing techniques are helping scientists reach insights in biology through the field known as bioinformatics, a multidisciplinary group of Rensselaer researchers has begun laying the foundations for “cheminformatics” by applying those tools to chemistry. The Rensselaer Exploratory Center for Cheminformatics Research (RECCR), one of six cheminformatics centers funded in 2005 by NIH, has been growing, with 13 faculty members in seven departments in science and engineering now working together to uncover basic truths about chemistry.
“Our goal is to provide better ways of predicting the function of molecules based on their structures,” says Curt Breneman, center director and professor of chemistry and chemical biology. Advances in the generation, mining, and analysis of chemical information are crucial to the development of new drug therapies and to modern methods of bioinformatics and molecular medicine, he says.
The RECCR was born from collaborations between three researchers: Breneman in chemistry, Kristin Bennett, professor of mathematics, and Mark Embrechts, associate professor of decision sciences and engineering systems. The vision was to bring together experts from such fields as machine learning, data mining, and predictive models who were interested in applying these techniques to chemistry. In addition, center membership includes other specialists and application scientists who either generate chemical data or who use the models and other methods developed.
The combination has produced a rich mixture of research projects such as “Mining Complex Patterns,” “Elucidation of the Structural Basis of Protein Kinetic Stability,” and “Statistical Models for Protein Folding Pathways.” In 2007, the center introduced two new Web-based tools, which are both freely available through the RECCR Web site to researchers around the world.
WebPDB was funded by NIH to complement the widely used Protein Data Bank (PDB), a repository of protein structures maintained at the Brookhaven National Laboratory. Although proteins are dynamic molecules, data in the PDB, which are derived from such techniques as X-ray crystallography and nuclear magnetic resonance, archive them as static structures under one set of experimental conditions. Breneman explains that when a crystallographer can’t resolve all of the amino acid chains, those not observed during the experiment are just left out. His group developed WebPDB to use the known amino acid sequences of proteins to reconstruct the missing portions of the three-dimensional structures.
In addition, when proteins are crystallized, they can carry other molecules such as ions. WebPDB cleans up and isolates the proteins. WebPDB also generates a set of descriptors for use in making predictions about a protein’s functions under different conditions. The descriptors include information on structure, electrostatic charges, and solubility in fats.
The second tool is called the RECCR Online Modeling System (ROMS), which is a general Web-based machine learning system. ROMS’s users can generate a model and visualize its performance by uploading their data set through the Web client. “Machine learning is where we try to learn from data,” says Bennett. “We’re trying to figure out some way to predict the activity of a molecule without actually testing it,” she explains. Machine learning allows molecules to be screened in silico, before more costly and time-consuming actual assays, and predictive models can guide the design of more effective assay experiments.