DESCRIPTION OF THE PROJECT
Drug Discovery and Semi-Supervised Learning - A presentation by Curt M. Breneman

      The techniques developed in this research result in a new framework for the automated discovery of new pharmaceuticals or materials.

      The basic idea is to utilize large existing pharmaceutical databases as input for a new type of structure/activity correlation methodology in order to calculate a large set of new and traditional descriptors to create improved Quantitative Structure-Activity Relationship (QSAR) models that characterize and predict important biological responses.

      Once the descriptors have been determined through the TAE method, and a predictive model has been built, thousands of new potential molecules, chemically similar to those of the benchmark data set, are scanned from large databases and are evaluated for their chemical properties based on the predictive model. The aim is to target a few novel molecules with potentially attractive pharmaceutical properties that can then be tested further in the traditional way in the laboratory. Neural network based data mining techniques help extract information used to select these novel molecules.

      New QSAR models were developed using Dr. Embrechts' StripMiner meta-code for the management of learning systems. The prototype StripMiner system utilized neural networks and supervised genetic clustering algorithms to construct QSAR models in an integrated system that accommodates bootstrapping for sensitivity analysis and feature selection. StripMiner was enhanced by incorporating novel learning methodologies developed by Dr. Bennett, such as semi-supervised learning with capacity control. These algorithms predict desired biological responses and generate QSAR models using both known (labeled) and unknown (unlabeled) biological responses.