Cheminformatics Research

Cheminformatics aims to find quantitative and predictive relationships between measured performance of small molecules and their chemical structures. Chemical structures are represented by numeric codes representing molecular constitution, connectivity, size, shape, and reactivity of small molecules. We are interested in descriptors of stereochemical, skeletal, and appendage diversity elements installed in build/couple/pair synthesis strategies, as well as on the decision-making process underlying these strategies.

Existing cheminformatic approaches consider multiple structural features of small molecules in the context of a single performance measurement (e.g., enzyme inhibition). Increasingly, small-molecule profiling experiments exploit the ability to make multiple measurements per compound either in parallel or in multiplexed detection strategies. Such datasets provide a unique opportunity for cheminformatics research – the relevance of quantitative descriptions of structure, and their connections with synthetic planning, can be judged using performance of small molecules across multiple biological contexts simultaneously. Multidimensional datasets reflecting performance provide a basis to find structure descriptions whose similarities best accord with similarities in small-molecule performance.

In our group, we research new descriptors of small-molecule structure, new methods for computing small-molecule similarity and diversity based on structure and performance, and new algorithms to correlate performance with chemical structure and synthetic decisions. We interact with synthetic chemists making libraries of novel compounds to quantitatively assess the consequences of synthesis decisions, and with the Broad Institute Compound Management team to help build optimal small-molecule screening collections using both diversity-based selection approaches and comparative analyses of compounds from different natural and synthetic sources.


  • Clemons, et al. (2011). Proc. Natl. Acad. Sci. USA, 108: 6817.
  • Hung, et al. (2011). Proc. Natl. Acad. Sci. USA, 108: 6799.
  • Muncipinto, et al. (2010). Org. Lett., 12: 5230.
  • Clemons, et al. (2010). Proc. Natl. Acad. Sci. USA, 107: 18787.
  • Chou, et al. (2010). ACS Chem. Biol., 5: 729.
  • Pizzirani, et al. (2010). Org. Lett., 12: 2822.
  • Swamidass, et al. (2010). J. Biomol. Screen., 15: 680.
  • Clemons and Wagner (2009). Curr. Opin. Chem. Biol. 13: 539.
  • Wilson, et al. (2009). J. Chem. Inf. Model. 49: 2231.
  • Tanikawa, et al. (2009). J. Am. Chem. Soc. 131: 5075.
  • Seiler, et al. (2008). Nucl. Acids Res. 36: D351-D359.
  • Bender, et al. (2007). Comb. Chem. High Throughput Screen. 10: 719.
  • Clemons (2007). Chemical Informatics, in Chemical Biology, Volume 2; Wiley-VCH.
  • Forman, et al. (2005). BMC Bioinformatics 6: 260.
  • Kim, et al. (2004). J. Am. Chem. Soc. 126: 14740.
  • Haggarty, et al. (2004). Comb. Chem. High Throughput Screen. 7: 669.