After a decade of genome-wide association studies, a new phase of discovery pushes on

By probing genetic variation, scientists continue to unearth the roots of disease and pave the way for precision medicine.

Lauren Solomon, Broad Communications/Photo by
Credit: Lauren Solomon, Broad Communications/Photo by

The pursuit of the genetic roots of common illnesses is, at its heart, a quest for better prevention and treatment of disease. Medicines used to treat common diseases today are mostly aimed at illnesses’ symptoms, not their underlying causes, which are frequently unknown. Diabetes, for example, is usually treated by lowering blood sugar, but this approach fails to address the underlying dysfunction that leads to damaged pancreatic beta cells or insulin resistance.

By shedding light on the biological mechanisms of disease, discovery of the genetic roots of common, heritable illnesses could pave the way to more accurate identification of people at risk for getting sick, better prevention strategies, and more effective therapies with fewer adverse effects.

With these goals in mind, scientists around the world have performed thousands of large-scale genetic studies over the last decade. Known as genome-wide association studies (or GWAS), the studies aim to uncover which common DNA differences among people influence traits like height or blood cholesterol level or raise the risk for developing diseases like cancer or metabolic, autoimmune, or psychiatric illnesses.


Despite criticism over the years that GWAS haven’t brought much predictive value and fail to fully explain the heritability of diseases, the greatest strength of this approach – and its true aim – is unearthing the biological roots of complex diseases. [For more, listen to this Biologic podcast with Broad researcher Joel Hirschhorn.] GWAS are powerful, in part, because they can uncover risk factors from across the genome in an unbiased way, without the need to successfully predict the most important genes for a given trait or illness. This means that GWAS findings can uncover previously unsuspected, yet important, biological mechanisms and pathways that could one day be targeted with drugs.


Thanks to strong collaborations that reach around the globe, pioneering experimental methods, and powerful statistical tools developed over the last decade, these studies have produced an avalanche of data and insight. Since the first large-scale GWAS were published in 2007, several thousand of these experiments have been performed — including many led by or done in partnership with the Broad Institute of MIT and Harvard — linking thousands of genetic regions with hundreds of diseases and traits.

Accumulation of GWAS hits across the genome, 2005-2017.
Images courtesy of the NHGRI-EBI GWAS Catalog.
Video produced by Broad Communications.

Efforts of Broad scientists have resulted in hundreds of variants associated with type 2 diabetes, heart disease, inflammatory bowel disease, rheumatoid arthritis, multiple sclerosis, autism, schizophrenia, and bipolar disorder.

Of course, the work does not end there. GWAS results reveal genetic changes that are correlated with disease, but that doesn’t mean that the variants identified by the studies cause illness. GWAS are most accurately viewed as a “hypothesis-generating” endeavor; rather than proving or refuting the role of a genetic variant in disease, the experiments highlight regions of the genome that likely harbor DNA changes that lead to disease — but those hypotheses need to be tested.

Scientists must unravel the true causal mutations that underlie GWAS “hits,” and then study those deeply to discover how they alter the way that genes, proteins, cells, or tissues function. That knowledge can inform the search for new medicines aimed at the molecular causes of illness, bringing us closer to a future of improved human health through genomic insight envisioned when the human genome was first decoded.

Rather than a shortcut to improving human health, GWAS represent a long journey along which these intrepid molecular explorers have made great progress. While researchers at the Broad and elsewhere continue to perform GWAS, they are also shifting gears to the next, and arguably more difficult, phases – uncovering the biological consequences of disease-associated mutations and turning that insight into new therapeutics.

“Now it’s relatively straightforward to discover genetic associations, but to then go from association to function is a much taller order,” said Jose Florez, chief of the diabetes unit at the Massachusetts General Hospital (MGH), an associate professor at Harvard Medical School (HMS), and an institute member at the Broad, where he co-directs the Metabolism Program and leads GWAS efforts in type 2 diabetes and related traits. “Uncovering causal mutations and then making the leap from mechanism to disease pathogenesis is a very laborious road to travel, but recent studies have made great headway along that crucial front.”

GWAS are not the only way to glean important biological pathways in common disease, but they are a powerful complement to other strategies, such as whole genome sequencing and analysis of rare variants. By compiling ever-larger study populations, refining analytical approaches, and pursuing the functional role of GWAS hits, genomics researchers continue to push the boundaries of what can be learned by digging into our DNA.

A revolution built on the shoulders of the Human Genome Project

The notion of probing genetic variation in the population to uncover the roots of common disease originated in the late 20th century, as genetics researchers caught the first glimpses of common DNA variation. All humans are 99.9% alike, genetically. As the Human Genome Project was nearing completion, researchers had a growing interest in defining the 0.1% of the genome that varies between people, largely in the form of single-letter DNA changes known as single-nucleotide polymorphisms, or SNPs (pronounced "snips"). Around ten million of the human genome’s three billion nucleotides, or letters of DNA, are polymorphic SNPs, meaning they occur in two or more common forms — for example, one person may have an adenine “A” at a particular spot in the genome, while in another person’s DNA, a thymine “T” takes up that spot.

Researchers have long known that most common diseases have a heritable component — they tend to run in families, which implies that DNA and its SNPs are likely involved. A picture had emerged of complex genetic and environmental causes for common disease in which a number of genetic risk factors each slightly raise a person’s risk for developing an illness. Unlike the mutations leading to “Mendelian” diseases, no single common disease SNP is sufficient to cause disease on its own; each contributes to disease risk along with dozens or even hundreds of other genetic risk factors and environmental contributors.

GWAS are based on the simple idea that if a genetic variant increases disease risk, it should be more frequent among cases than among healthy controls. Before embarking on massive hunts for disease-causing SNPs, however, they first needed a map of human genetic variation.

Early efforts to analyze common variation in the genome were hampered by lack of a reference genome or cost-effective means of identifying, or genotyping, SNPs. In the early 2000s, cheaper genotyping technologies enabled researchers in the International SNP Consortium to survey SNPs across the genome. In 2005, the International HapMap Consortium mapped SNPs that are often inherited together in DNA blocks, called haplotypes. This so-called “HapMap” allowed a smaller number of SNPs to stand in for their neighbor variants, reducing the cost and resources required to test a DNA sample for the bulk of its common variation.

Companies soon began producing cost-effective genotyping arrays, also known as “SNP chips,” that could test a single sample for hundreds of thousands of SNPs at a time, making GWAS a real possibility. Anticipating the importance of this technology, the Broad Institute built a high-capacity genetic analysis facility soon after the institute’s inception, becoming the first NIH-funded national genotyping center dedicated to large-scale SNP analysis in 2004.

“Back when we were building genetic maps in the 1980s and 90s, we didn’t have the toolkit to uncover what really causes disease. At that time, we didn’t even have the sequence of the human genome or even know what all the genes were or how many there were,” said Mark Daly, co-director of the Program in Medical and Population Genetics and an institute member at the Broad, who is also founding chief of the Analytic and Translational Genetics Unit at MGH. Daly has pioneered statistical methods essential to GWAS and has led many such studies of autism, Crohn’s disease, and other illnesses. “So we began a long process of making maps of genomes. As we neared the point of understanding the genome on at least a structural level, we renewed our interest in discovering the genes for more complex diseases.”

The pursuit of common risk factors picks up speed

Enabled by the HapMap, inexpensive genotyping technology, sophisticated analytical methods, and an emphasis on open sharing of data, the GWAS approach began to bear fruit in 2007 when the first large genome-wide association study was published in Nature. Conducted by the Wellcome Trust Case Control Consortium, the landmark study scanned the genomes of people with and without seven diseases — 2,000 cases each of bipolar disorder, coronary heart disease, high blood pressure, Crohn’s disease, rheumatoid arthritis, and type 1 and type 2 diabetes (T2D), along with 3,000 healthy controls — and revealed 24 significant disease-associated DNA variants. Among other GWAS efforts to bear fruit that year were several publications from deCODE Genetics, uncovering variants linked to atrial fibrillation, T2D, and myocardial infarction.

Also that year, researchers in the Broad’s Diabetes Genetics Initiative, along with collaborators at Lund University and Novartis Institutes for BioMedical Research, published results of their first GWAS. Appearing in Science, the study scanned the genomes of 3,000 Scandinavian people for type 2 diabetes risk and triglyceride levels, resulting in three new genetic regions, or loci, associated with T2D and one with triglycerides.

Since then, several thousand GWAS have produced tens of thousands of strong associations between genetic variants and one or more complex traits. International consortia pooled data to conduct ever-larger studies involving tens of thousands of cases and controls. The GWAS Catalog, maintained jointly by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EMBL-EBI), now includes dozens of massive GWAS that each involve more than 100,000 subjects. For example, the latest effort of the GIANT Consortium, aimed at uncovering genetic roots of body size and shape, scanned the genomes of more than 700,000 people and found 83 new DNA changes linked to height.

The increasingly large studies have been possible, in part, because of statistical methods to predict, or impute, SNPs that were not directly genotyped, based on their patterns of relatedness to other SNPs that were directly tested. Imputation allows scientists to integrate data from genotyping arrays that test for different sets of SNPs, and to combine distinct cohorts into massive datasets with the statistical power to reveal genetic markers with more subtle effects.

“For pretty much every polygenic trait and every disease that we looked at with GWAS, more samples gave us more and more discovered loci, in a surprisingly linear fashion,” said GIANT Consortium leader Joel Hirschhorn, an institute member and co-director of the Metabolism Program at the Broad. Hirschhorn, who is Concordia Professor of Pediatrics and a professor of genetics at Boston Children’s Hospital and Harvard Medical School, is a thought leader in setting GWAS standards for the field.

As GWAS hits were accumulating, Hirschhorn and his colleagues were surprised by the number and diversity of DNA changes uncovered by the global GWAS effort, which led to unexpected insights about the nature of genetic variation and its role in disease. Scientists have learned that common diseases are indeed highly complex, with some diseases or traits being associated with hundreds of genetic regions, in addition to many genetic variants that associate with more than one trait or disease. GWAS have also shown that disease-associated variants are largely distinct from the genes initially suspected to be important, with most associated loci falling in portions of the genome that do not encode for proteins.

“Surprisingly, the hits we got from GWAS didn’t overlap well with the list of candidate disease genes that we had drawn up,” said Hirschhorn. “Our next step is even more challenging: going from those loci to uncovering the actual genes that are involved and discovering what that means for the underlying biology.”

The long road to interpret GWAS hits

The SNPs tested in GWAS are each like a tag for a haplotype block, which often spans tens of thousands of DNA bases. So the first step is to pinpoint the genetic change within that block that increases disease risk.

Our next step is even more challenging: ...uncovering the actual genes that are involved and discovering what that means for the underlying biology.

To uncover these causal mutations underlying GWAS “hits,” scientists can perform deep analysis of the surrounding DNA with a technique called “fine-mapping,” in which a denser set of SNPs is genotyped in the DNA region of interest. If the causal mutation falls within DNA that encodes for a protein, scientists can perform molecular studies in a dish, cells, or a model organism to understand its functional consequences or the biological pathways in which it plays a role.

Most of the mutations revealed through GWAS, however, fall within non-protein-coding regions that are thought to regulate the activity, or expression, of one or more genes nearby. To understand these mutations, researchers often rely on insights from large consortium projects aimed at unraveling genetic function — such as the ENCODE, Roadmap Epigenomics, and Genotype Tissue Expression (GTEx) projects, all led in part by the Broad. Data from these efforts allow scientists to connect genetic variants with changes in expression of other genes.

They can also employ experimental tools that probe DNA-associated chromatin proteins and epigenetic marks; study the mutations in model cells or organisms, or reporter assays; integrate GWAS results with large-scale data on the epigenome, transcriptome, proteome, or metabolome; and use techniques that can precisely introduce the associated mutation into cells, such as CRISPR-Cas9 genome editing.

Stories of success

While many GWAS hits await further study, real progress has been made in several areas, including Crohn’s disease, schizophrenia, type 2 diabetes, and cardiovascular traits and diseases, to name a few.

Inflammatory bowel disease


The inflammatory bowel diseases (IBD), Crohn’s disease (CD) and ulcerative colitis (UC), were among the first conditions researchers aggressively pursued using GWAS. With few genetic risk factors known before the advent of GWAS, early genome-wide studies uncovered dozens of genetic links to IBD, including the first clues that the cellular process of autophagy, in which proteins are digested and recycled, is important in Crohn’s disease.

With a clear need for larger sample sizes, scientists around the globe collaborated to create the International IBD Genetics Consortium (IIBDGC) to combine datasets and efforts. By 2012, the IIBDGC had published three large meta-analyses resulting in 163 IBD-related loci, representing what was at the time the most for any complex disease. Researchers have now associated more than 200 genetic regions with IBD through GWAS, with some genetic markers raising risk for both UC and CD.

A recent study led in part by the Broad used new statistical methods to fine-map suspect regions from an analysis of 67,000 people, including healthy individuals and those with IBD. The researchers were able to pinpoint the causal mutations underlying 18 IBD-associated DNA regions, a significant step forward in the ability to use genetics to uncover disease biology.

A 2016 study led by Mark Daly gleaned new biological insights into IBD by exploring previous GWAS hits and uncovering a rare mutation that protects against ulcerative colitis by disrupting the function of the gene. “These protective alleles are informative because they get us closer to understanding the biological role of the gene — what it does in health and disease,” said Daly. Because protective, “loss-of-function” mutations occur naturally in the population, new therapies that mimic their effects could be more likely to be safe and effective. “That’s why there is great enthusiasm for this approach of looking for mutations that break genes,” said Daly.


Over the past six decades, there has been little innovation in drug development for schizophrenia, in part because its molecular and cellular underpinnings have not been well understood. There are no good models of schizophrenia in cells, animals, or human tissues, so traditional scientific approaches can’t be used to illuminate the disorder. GWAS offers a non-invasive approach to studying psychiatric illness by observing the effects of naturally occurring genetic variation. [For more on psychiatric disease genetics research, see this Biologic Podcast with Mark Daly.]

Broad scientists have worked with researchers across the globe to collect genetic material from tens of thousands of patients with schizophrenia and healthy controls. A 2014 GWAS led by the Schizophrenia Working Group of the international Psychiatric Genomics Consortium (PGC) was the largest genomic study published on any psychiatric disorder at the time, resulting in 108 genomic loci associated with risk of developing schizophrenia, up from only a handful known a few years prior.


By following up on one of these genetic regions with clever analytical approaches, scientists in the Broad’s Stanley Center for Psychiatric Research, Harvard Medical School, and Boston Children’s Hospital in 2016 uncovered genetic evidence that schizophrenia may be caused in part by excessive synaptic pruning (or elimination of connections between neurons) in the brain during late adolescence, the typical period of onset for schizophrenia symptoms. Studies like this one can not only suggest new therapeutic avenues to be explored, but also help remove the stigma from mental illness by providing clear molecular underpinnings of psychiatric disease.

Type 2 diabetes

Back in 2007, a handful of GWAS on type 2 diabetes were published including the work of the Diabetes Genetics Initiative (DGI), led in part by the Broad. Collectively, these studies identified 10 genetic risk factors and suggested that yet-undiscovered T2D-associated loci would have small effects on risk, necessitating even larger studies. The Diabetes Genetics Replication and Meta-Analysis (DIAGRAM) Consortium later combined samples from the DGI with those from the WTCCC and the Finland-United States Investigation of NIDDM Genetics (FUSION), increasing the number of study subjects to more than 10,000 and identifying six more genetic regions associated with T2D. This “meta-analysis” approach soon became a model for making headway in other complex traits. The second DIAGRAM study was even larger – 45,000 samples – and identified 12 more T2D loci.

Nearly 100 genetic regions have now been associated with type 2 diabetes through GWAS. Some of these studies have included non-European populations; for example, the Slim Initiative in Genomic Medicine for the Americas (SIGMA) T2D Consortium based at the Broad focuses on people of Latin American descent, a group with double the risk of developing T2D as that of people of European ancestry.


In 2013, SIGMA researchers conducted a GWAS on DNA samples from more than 9,000 Hispanic people from Mexico and the U.S., and one of the variants was associated with a roughly 30 percent increase in diabetes risk. In 2017, SIGMA researchers published their efforts to trace this association to a specific gene, SLC16A11, and use an array of molecular, biochemical, cellular, and physiological experiments to uncover two distinct mechanisms by which those variants disrupt the gene’s function in liver cells, possibly contributing to the pathogenesis of T2D. The findings not only offer insights into the biology underlying T2D and suggest new leads in the search for therapeutics, but also highlight the importance of including diverse populations in GWAS. The approach serves as a model for successfully pursuing the thousands of GWAS hits that await further study.

Coronary artery disease

Since 2007, GWAS have identified nearly 100 genetic variants associated with coronary artery disease, some near genes with known roles in lipid metabolism and others related to blood pressure. Sequencing studies of rare variants have highlighted the biological pathways involved.

Harnessing the power of numbers, a recent GWAS analyzed data on coronary artery disease (CAD) from the UK Biobank, which contains hundreds of thousands of samples. The work identified 15 new loci, bringing the total associated with CAD to 95 and implicating insulin resistance pathways and transendothelial migration of leukocytes in CAD pathogenesis.

Another recent study led by Broad scientist Sekar Kathiresan probed a genetic region that he and colleagues originally spotted in a 2009 GWAS aimed at heart attack risk and that was later associated with four other vascular diseases (migraine headache, fibromuscular dysplasia, cervical artery dissection, and hypertension). Through deep analysis of the region, they uncovered a SNP within the PHACTR1 gene that appeared to be the causal variant. Using gene regulatory data and genome editing tools, the team showed that the SNP controls not PHACTR1 but rather EDN-1, a gene located far away from the SNP along chromosome 6.


“We’ve gained incredible insights from GWAS efforts, with many more to come,” said Kathiresan, director of the Center for Genomic Medicine at MGH, associate professor of medicine at Harvard Medical School, and an institute member of the Broad, where he directs the Cardiovascular Disease Initiative. Kathiresan’s work leading GWAS on cardiovascular diseases and traits has shed light on the biological mechanisms underlying heart attack, uncovered mutations that protect against heart attack risk, and led to a genetic test for personalized heart attack prevention.

“Elucidating this pleiotropic variant with effects so far away in the genome demonstrates the unexpected connections we can glean through unbiased study,” he continued. “Our approach provides a model for making sense of the thousands of genetic loci still to be annotated.”

A bright future for GWAS

In the near future, scientists will be able to mine massive population biobanks currently being curated, which will combine genome-wide data (from genotyping arrays and, increasingly, whole genome and exome sequencing) with detailed information on measured traits, lifestyle, diet, and environmental exposures. The increased sample sizes are expected to promote the discovery of disease genes with even smaller effects, in addition to shedding light on the interactions both among genes and between genetic and environmental factors.

Dropping costs of exome-wide genotyping and next-generation DNA sequencing should help to highlight less common or rare variants in future GWAS, which can take advantage of recent surveys of human genetic variation, such as the 1000 Genomes Project and the Exome Aggregation Consortium. While GWAS have so far focused primarily on European populations, researchers are putting more emphasis on including diverse populations in their studies, such as African and Latin American people, in addition to isolated populations.

“GWAS is going to be the first line of attack in understanding the genetics of these diverse populations, and large-scale SNP analysis is still integral to the search for common risk factors today,” said Stacey Gabriel, who, in her roles as senior director of the Broad Genomics Platform and an institute scientist at the Broad, helped build the platform into one of the largest genetic sequencing and analysis centers in the world. “This year, our facility will analyze tens of thousands of samples using genotyping arrays, which remain powerful tools to elucidate disease biology, especially when combined with whole exome or genome sequencing. We’re not slowing down our GWAS efforts anytime soon.”