Cheminformatics has emerged as crucial in a wide variety of applications, from library design to drug synthesis and selection. Drug discovery would be revolutionized if a small preliminary assay could be used to “characterize” the behavior of those small molecules and, using their chemical structure, identify which substructures are predictive of biological performance.
A variety of computational methods can now derive a set of substructures which is predictive of biological performance in biological assays. Diego and his Broad colleagues were able to extract a subset of substructures from a collection of compounds. The absence or presence of just 20 substructures in a compound was enough to predict the behavior of a compound in a biological assay 67% of the time. To do this they used a variety of computational methods: ReliefF, K nearest neighbor, cross-validation, and the Tanimoto coeffecient. Eventually, they will improve the resolution of their prediction, and could implement different algorithms to find the substructures.
PROJECT: Predicting patterns of biological performance using chemical substructure features
My summer at the Broad gave me a little taste of what’s to come. If all goes well, this is what I plan to be doing in the future. Although it was a rough start – I had to catch up with a lot of very technical methods – it quickly became a productive, result-producing research experience. I was able to leave the Broad knowing that I produced something concrete, something that can and will be used in the future.