Five Questions with Chengwei Luo
Our bodies are full of bugs — and as we’re learning, this is great news. These millions of microscopic species, collectively called the microbiome, outnumber our own cells and help keep us healthy and alive. Maintaining (and in some cases restoring) a healthy microbiome requires a solid understanding of what those bugs are and how they function. Complicating this is the fact that, just as there are genetically distinct families of humans, there are also many families, or strains, within a single bacterial, viral, or fungal species. When it comes to truly understanding the microbiome, researchers need a good way to tease out differences between the strains — even more so, perhaps, than between the species themselves.
However, because unique strains represent such a fine level of resolution, assessing these differences often evades current technologies. Researchers have used both computational and experimental methods to infer the different species in a sample, but these methods aren’t straightforward and often overlook important levels of diversity within the microbial community.
In a recent paper in Nature Biotechnology, Broad researchers from the lab of institute member Ramnik Xavier present ConStrains, an algorithm designed to identify both the relative abundance of strains within a microbial species, as well as the particular genotype of genes that they all carry (the “core genome” of the species). ConStrains stands for conspecific strains (meaning strains within the same species) and uses data from current mainstream metagenomic studies, which use samples that come straight from the environment rather than samples grown in a lab.
For this edition of “Five Questions,” we asked Chengwei Luo, a postdoctoral researcher in Xavier’s lab and first author on the paper, to elaborate on the new tool, which could have major implications for the growing community of researchers studying the microbiome:
Q1. Why is understanding bacterial strain specificity important — what new information is revealed at this level of resolution?
A great amount of diversity is harbored at the strain level. Therefore if we were to only stay at the species level (like most of the current metagenomic studies do), a lot of insights would likely be overlooked. For instance, at the species level, a relatively harmless strain of E. coli might look identical to the “Enterohaemorrhagic” strain, but the latter possesses virulence factor genes that could cause severe diarrhea.
Q2. How does ConStrains identify individual bacterial strains?
The core or “non-dispensible” genome is the set of all genes in a species that remain the same from one strain to the next. ConStrains is based on the observation that there are some single nucleotide polymorphisms (SNPs) in the core genome of two conspecific strains. By analyzing the relative abundance of different loci at various SNP sites, it is possible to infer the relative abundance of individual strains, as well as the versions of particular genes, or genotypes, in their core genome. This process resembles color mixing in a way — you can guess the yellow to blue mixing ratio when you see a green hue.
Q3. How does ConStrains compare to other methods of its kind?
The fundamental difference is that other methods are largely dependent on the comprehensiveness of a reference strain collection — that is, they work best when many strains have already been sequenced to serve as references. They compare the samples with known strains, and infer strain diversity from there. ConStrains uses a different approach: instead of focusing on collecting information via reference strains, it leverages the part of the genome that is universally shared among strains. This way, it minimizes the dependence on reference strains, requiring only one reference genome in the database.
Q4. What new insights did you discover using ConStrains in this initial study?
We applied ConStrains to an established set of genomes collected over time from the infant gut microbiome and came to a number of interesting new insights. For instance, we saw that even though the total relative abundance of a species stays roughly the same over time, the relative abundances of strains undergo drastic changes. For example, the abundance of the dominant strains of one species — B. logum — changed significantly, having implications on human milk sugar utilization during the baby weaning period.
Q5. How will ConStrains inform future microbiome studies and improve on current work in this field?
Current microbiome studies usually probe diversity and function only at the species level, and thus a great amount of information is potentially overlooked. We envision that, with the aid of ConStrains, we will be able to link some previously unexplained information with specific strains. It could also help scientists understand some of the core questions in microbial studies. For instance, what’s the impact of antibiotics usage on our microbiota? ConStrains could compare diversity before and after antibiotic use, which may not impact the species, but instead change the conspecific strain types present. This is similar to what we observed with the B. logum strain in the paper.
Paper Cited: Luo, C., et al. ConStrains identifies microbial strains in metagenomic datasets. Nature Biotechnology. DOI: 10.1038/nbt.3319