Scientific Publications

Detecting novel associations in large data sets.

Publication TypeJournal Article
AuthorsReshef, DN, Reshef YA, Finucane HK, Grossman SR, McVean G., Turnbaugh PJ, Lander E. S., Mitzenmacher M., and Sabeti PC
AbstractIdentifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Year of Publication2011
JournalScience (New York, N.Y.)
Volume334
Issue6062
Pages1518-24
Date Published (YYYY/MM/DD)2011/12/16
ISSN Number0036-8075
DOI10.1126/science.1205438
PubMedhttp://www.ncbi.nlm.nih.gov/pubmed/22174245?dopt=Abstract