New online resource helps connect rare genetic variants to human health and disease
Genebass summarizes a genetic analysis of nearly 400,000 people in the UK Biobank and could help researchers identify new therapeutic targets.
By Allessandra DiCorato
Credit: Susanna Hamilton, Broad Communications
In 2006, researchers discovered that people with certain variants of the gene PCSK9 had dramatically lower cholesterol levels than the general population. Individuals with two broken copies of the gene — one from each parent — were up to 88 percent less likely to have coronary heart disease than people with functioning PCSK9. Within 12 years of these discoveries, scientists had developed drugs that mimicked the effects of turning off the gene and doctors began prescribing them to lower cholesterol in patients.
Now, researchers at the Broad Institute of MIT and Harvard have created a resource that could help reveal potential therapeutic target genes like PCSK9. The team scanned the exomes — protein-coding portions of the genome — from nearly 400,000 individuals in the UK Biobank, looking for associations between rare genetic variants and diseases or traits. They also built a publicly accessible browser, called Genebass, to summarize their analysis. Their findings appear today in Cell Genomics.
Konrad Karczewski, co-first author of the paper and a computational scientist at Broad, says that other researchers studying specific genes can use Genebass to learn more about traits or diseases connected with them, and vice versa.
“Building this data set and releasing it to the general public is exciting because while we haven’t discovered the next PCSK9 yet, someone else may, using this resource,” Karczewski said.
Matthew Solomonson, a co-first author on the study and associate director of genomic data visualization at the Broad, led the development of the Genebass browser. “We’ve had 18,000 users try the browser since its launch in June last year, and we’re still having people across the world accessing it to look up genes,” he said.
The study was also led by Katherine Chao, co-first author and a software product manager at Broad; Benjamin Neale, co-senior author and co-director of the Program in Medical and Population Genetics at the Broad; and researchers from AbbVie, Biogen, and Pfizer.
Systematic study
In genome-wide association studies, scientists comb through whole genomes in a population for genetic variation associated with disease. These analyses have yielded valuable insights into the genes that explain traits ranging from human height to schizophrenia, but mostly probe common variants. Typically, the more common a variant is, the less impact it has on a trait. Many genetic variants known to cause disease are rare, but systematically studying rare variants requires huge datasets and processing power.
Using a large-scale computational system built by the Hail team at the Broad, the researchers looked for associations between more than 4,500 traits and rare genetic variants in the UK Biobank. The system, which takes advantage of extensive processing power in the cloud, analyzed in the span of a few weeks what would have taken a single 128-core computer over 15 years to complete. Their analysis revealed that among rare variants, the most severe — such as loss-of-function variants that completely disable a gene — had more associations with diseases and traits.
The team also identified rare variants that had associations with biological processes. For example, they found a previously unreported association between loss-of-function mutations in the SCRIB gene and the integrity of white matter, or nerve fibers thought to support brain function, in a particular part of the inner brain. Karczewski says that Genebass could help other users identify similar biological associations that suggest new avenues for future disease research.
Next, Karczewski and Solomonson hope to expand their browser to larger and larger datasets from populations with more diverse ancestry, such as from the All of Us research program at the National Institutes of Health. “Using these fuller, broader datasets may give us a lot of interesting information in terms of better representation and better power for gene discovery,” Karczewski said.