Microarray technology is routinely used in labs across the world to obtain global gene expression profiles of many biological systems of interest. New methodologies to understand the results of these experiments, such as Broad's Gene Set Enrichment Analysis (GSEA), are based on gene sets rather than single genes, which can assist in interpreting results at a higher level than gene-by-gene expression analyses and facilitates discovery of underlying biological mechanisms. GSEA is a new tool, and Jean Junior and her Broad colleagues focused on increasing its utility.
GSEA uses a database of gene sets (MSigDB), including sets that represent diverse biological processes (defined by previous experiments), pathway databases, manually curated lists of genes, or published results. Jean focused on improving C2, the functional sub-database of MSigDB, by expanding the collection of gene sets included in the database from the published literature. She collected and documented 313 new gene sets, over 70 of which are already incorporated into a new release (C2.1) of MSigDB. The power of these additional data was demonstrated by running GSEA with the enhanced C2.1 and two lung cancer data sets: the results included both previously obtained results as well as a new set of E2F target genes associated with poor treatment outcome.
PROJECT: Improving the Utility of Gene Set Enrichment Analysis of Microarray Data
After running the GSEA method with the new C2.1, I found that a certain therapy was enriched with poor treatment outcome in two separate lung cancer studies. This new result shows the discovery potential of improved gene set databases. It also shows that GSEA is only as good as its collection of gene sets in the MSigDB.