Rare genetic variants can reveal much about disease biology

orange DNA helix on yellow background of DNA base pair letters
Credit: Susanna Hamilton, Broad Communications

Over the last two decades, scientists have primarily used two strategies to study the genetics of common diseases like diabetes and schizophrenia. One involves looking for links between disease and common genetic variants such as single nucleotide polymorphisms (SNPs) that are scattered throughout the genome. The other, more recent approach focuses on the protein-coding portion of the genome (the exome) to find ultra-rare mutations (ones that might appear in tenths or thousandths of a percent of the population) that are difficult to find in genetic studies, but which can dramatically increase disease risk.

Geneticists have debated the roles rare mutations might play in causing and studying common diseases. Natural selection tends to sift out such mutations, and as a result scientists have struggled both to find them and to measure their influence on disease risk compared to common SNPs. 

Now in a study published in Nature, researchers at the Broad Institute of MIT and Harvard have done just this for 22 complex traits and diseases. A team led by Daniel Weiner, Ajay Nadig, and Schmidt fellow Luke O'Connor— all in Broad's Program in Medical and Population Genetics — used whole-exome data from nearly 395,000 participants in UK Biobank and the Genebass browser to develop a new method that quantifies how much rare protein-coding mutations across the whole genome contribute to common disease. 

The researchers found that rare mutations make a small but important contribution to the traits that they analyzed. They confirmed previous research suggesting that for any given disease or trait, rare mutations and common SNPs often converge on the same, potentially causal biological mechanisms. They also concluded that while looking for rare mutations linked to disorders like schizophrenia is challenging and requires large studies, it can reveal much about the biological mechanisms of disease and highlight tractable numbers of genes and promising drug targets.    

We spoke with Weiner, Nadig, and O'Connor about their method, called BHR (or burden heritability regression), what they've learned about the structure of common- and rare-variant genetics in disease, and why the hunt for rare mutations can advance increase our understanding of the biology of common diseases.


Why is it important to draw a clearer picture of rare mutations' contributions to disease?

AN: There's something in genetics called the "missing heritability" problem. Historically, people have had an easier time estimating common SNPs' impact on the heritability of — that is, how much genetics explains differences in — complex traits and common disease. But SNP-based heritability doesn't explain the total genetic component, and one hypothesis has been that rare coding mutations would make up the difference. The problem is, the tools and data haven't been there to explore that assumption.

LO: As you can imagine, it's a lot harder to study what something does when it's rare. But recently there's been a lot of progress in identifying significant individual genes by detecting rare mutations with large effects. And with large-scale biobank-based studies like UK Biobank, the data are now becoming available to study and evaluate such mutations systematically. 

Our study was an attempt to look across the genome as a whole, to look at the aggregated contributions of rare mutations, and to look at the architecture of those variants: the kinds of genes they implicate, how many genes they implicate, and how their distribution compares with that of common SNPs. 


As you ran this study, what findings emerged?

DW: We saw strong indications that SNPs and rare mutations often point to the same underlying biology. For a given trait, both kinds of variation often converge on the same tissue, the same cell types, and the same sets of genes.

LO: And if that's the case, then we can use rare mutations as signposts, because they point very clearly to biology that would be quite hard to disentangle just by analyzing SNPs, with their small effect sizes and widespread distribution across the genome.

Now we also see that rare coding mutations still don't fill the "missing heritability" gap that Ajay mentioned earlier. We estimate that they still only account for about 1.3 percent of the variability seen in the 22 traits and diseases we studied.

But because these mutations have such large effects and arise in a much smaller portion of the genome, they do a lot more to reveal the core biology behind a given trait. We found that something like seven genes often explain around a third of the rare variant heritability for a given trait. That's much easier to follow up on functionally than trying to sift through thousands of SNPs. 

DW: There's still value in studying common SNPs' associations with disease, though, because  polygenic risk scores built on them are powerful tools for disease risk prediction.


Were there any key insights that allowed you to come to these findings?

AN: One thing that Luke noticed early on is how oftentimes different mutations can have the same effect, and that this phenomenon could be used to estimate heritability. So for example, let's say that I find five people in a population who each have rare mutations at different points within the same gene. Statistically you'd typically say they're all different mutations, but in the end, they all have the same effect: the gene stops working. 

LO: There's an approach called burden testing used to analyze rare variants, where you say, "What if we pretend that those five variants are all basically just one variant?" Here we applied that to calculate heritability estimates, which in the end gives us a better sense of how, and how much, a given gene influences a trait or disease.


Why do this study now?

DW: It's only been within the last five years or so where it's become feasible to run exome sequencing studies on large, well-characterized populations backed by extensive hospital registry and biobank data like UK Biobank. And now people are making the investment to sequence hundreds of thousands of people and create the resources needed to see these ultra-rare mutations.


What comes next?

LO: There are going to be lots and lots of new exome and whole genome sequencing studies coming out in the future, including for diseases where we don't currently have enough cases to make genetic discoveries. I hope others apply BHR to analyze the genetic architecture of those traits and diseases.

DW: While we applied BHR to UK Biobank data, anyone could use it to analyze the results of any large-scale exome sequencing study. For instance, we also used it with data from the SCHEMA schizophrenia and BipEx bipolar disorder studies, and found that the current list of significant genes doesn't explain all of the burden heritability for those conditions, which suggests that there are more genes to be found for both. We’ve also released open-source BHR software to allow any researcher to perform these types of analyses. 

AN: Right now BHR only looks at loss-of-function mutations and loss-of-function-like missense variants, ones that actually prevent the production of functional proteins. We aim to expand it to look at other kinds of rare missense variation that alter proteins in different ways, and at the correlations and relationships between different traits at the genetic level. That's something that can be done with BHR in a quantitative way and couldn't be done before.


This study was supported by the National Institute of Mental Health, the National Library of Medicine, the National Institute of General Medical Sciences, the Simons Foundation Autism Research Initiative, and the Broad Institute.

Paper(s) cited

Weiner DJ, Nadig A, et al. Polygenic architecture of rare coding variation across 394,783 exomes. Nature. Online February 8, 2023. DOI: 10.1038/s41586-022-05684-z.