Quantifying Metagenomic Strain Associations from Microbiomes with Anpan.
Authors | |
Abstract | Genetic and genomic variation among microbial strains can dramatically influence their phenotypes and environmental impact, including on human health. However, inferential methods for quantifying these differences have been lacking. Strain-level metagenomic profiling data has several features that make traditional statistical methods challenging to use, including high dimensionality, extreme variation among samples, and complex phylogenetic relatedness. We present Anpan, a set of quantitative methods addressing three key challenges in microbiome strain epidemiology. First, adaptive filtering designed to interrogate microbial strain gene carriage is combined with linear models to identify strain-specific genetic elements associated with host health outcomes and other phenotypes. Second, phylogenetic generalized linear mixed models are used to characterize the association of sub-species lineages with such phenotypes. Finally, random effects models are used to identify pathways more likely to be retained or lost by outcome-associated strains. We validated our methods by simulation, showing that we achieve more accurate effect size estimation and a lower false positive rate compared to alternative methodologies. We then applied our methods to a dataset of 1,262 colorectal cancer patients, identifying functionally adaptive genes and strong phylogenetic effects associated with CRC status, sometimes complementing and sometimes extending known species-level microbiome CRC biomarkers. Anpan's methods have been implemented as a publicly available R library to support microbial community strain and genetic epidemiology in a variety of contexts, environments, and phenotypes. |
Year of Publication | 2025
|
Journal | bioRxiv : the preprint server for biology
|
Date Published | 01/2025
|
ISSN | 2692-8205
|
DOI | 10.1101/2025.01.06.631550
|
PubMed ID | 39829854
|
Links |