OASIS: An interpretable, finite-sample valid alternative to Pearson's for scientific discovery.
Authors | |
Keywords | |
Abstract | Contingency tables, data represented as counts matrices, are ubiquitous across quantitative research and data-science applications. Existing statistical tests are insufficient however, as none are simultaneously computationally efficient and statistically valid for a finite number of observations. In this work, motivated by a recent application in reference-free genomic inference [K. Chaung ., , 5440-5456 (2023)], we develop Optimized Adaptive Statistic for Inferring Structure (OASIS), a family of statistical tests for contingency tables. OASIS constructs a test statistic which is linear in the normalized data matrix, providing closed-form -value bounds through classical concentration inequalities. In the process, OASIS provides a decomposition of the table, lending interpretability to its rejection of the null. We derive the asymptotic distribution of the OASIS test statistic, showing that these finite-sample bounds correctly characterize the test statistic's -value up to a variance term. Experiments on genomic sequencing data highlight the power and interpretability of OASIS. Using OASIS, we develop a method that can detect SARS-CoV-2 and strains de novo, which existing approaches cannot achieve. We demonstrate in simulations that OASIS is robust to overdispersion, a common feature in genomic data like single-cell RNA sequencing, where under accepted noise models OASIS provides good control of the false discovery rate, while Pearson's [Formula: see text] consistently rejects the null. Additionally, we show in simulations that OASIS is more powerful than Pearson's [Formula: see text] in certain regimes, including for some important two group alternatives, which we corroborate with approximate power calculations. |
Year of Publication | 2024
|
Journal | Proceedings of the National Academy of Sciences of the United States of America
|
Volume | 121
|
Issue | 15
|
Pages | e2304671121
|
Date Published | 04/2024
|
ISSN | 1091-6490
|
DOI | 10.1073/pnas.2304671121
|
PubMed ID | 38564640
|
Links |