The limitations of simple gene set enrichment analysis assuming gene independence.

Stat Methods Med Res
Authors
Keywords
Abstract

Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.

Year of Publication
2016
Journal
Stat Methods Med Res
Volume
25
Issue
1
Pages
472-87
Date Published
2016 Feb
ISSN
1477-0334
URL
DOI
10.1177/0962280212460441
PubMed ID
23070592
PubMed Central ID
PMC3758419
Links
Grant list
R01 GM074024 / GM / NIGMS NIH HHS / United States
U24 CA194107 / CA / NCI NIH HHS / United States
R01-CA121941 / CA / NCI NIH HHS / United States
R01 CA109467 / CA / NCI NIH HHS / United States
P30 CA023100 / CA / NCI NIH HHS / United States
U54 HD090255 / HD / NICHD NIH HHS / United States
R01 CA121941 / CA / NCI NIH HHS / United States