The sequencing of the human genome has led to an explosion of new insights into the genetic basis of disease. A challenge, however, is that while the identity of disease-associated genes may be known, in many cases their function remains obscure.
At the same time, approaches to chemical biology and drug discovery have dramatically expanded. New types of chemical libraries have been generated, powerful screening methods have been developed, and novel classes of therapeutics are entering the clinic. And yet, there has been no method to systematically determine the cellular effects of a given compound – unexpected off-target activities are often discovered only late in the drug development process, resulting in side effects that limit clinical use.
We hypothesized that a potential solution to these problems might be the creation of a comprehensive catalog of cellular signatures representing systematic perturbation with genetic (thus reflecting protein function) and pharmacologic (thus reflecting small-molecule function) perturbagens. Signatures with high similarity might represent useful and previously unrecognized connections (e.g. between two proteins operating in the same pathway, between a small-molecule and its protein target, or between two small-molecules of similar function but structural dissimilarity). Such a catalog of connections could serve as a functional look-up table of the genome; we termed this concept the Connectivity Map (CMap).
To date, CMap has generated a library containing over 1.5M gene expression profiles from ~5,000 small-molecule compounds, and ~3,000 genetic reagents, tested in multiple cell types. To produce data of that scale, we’ve developed L1000, a relatively inexpensive and rapid high-throughput gene expression profiling technology. Expression data are processed through a computational pipeline that converts raw fluorescence intensity into signatures, which can be used to query the CMap database for perturbations that give a related gene expression response.
To house and use these vast amounts of data, we have built a cloud-based compute infrastructure termed CLUE (CMap and LINCS Unified Environment), a suite of user-friendly web applications and software tools that enable researchers to access and manipulate CMap data and integrate it with their own data datasets.
We invite biologists and computational scientists to use CMap to further your research.
Funding for our work comes from the NIH LINCS (Library of Integrated Cellular Signatures) project, as well as from philanthropic grants, collaborative projects with industry and Broad Institute funds.