Ten years of perspective on GWAS, with Joel Hirschhorn

Joel Hirschhorn, institute member and co-director of the Metabolism Program at the Broad Institute and professor of pediatrics and genetics at Harvard Medical School and Boston Children’s Hospital, sits down with BioLogic to talk about how the research community has approached genome-wide association studies (GWAS) from concept to execution through the last decade.

Susanna M. Hamilton, Broad Communications
Credit: Susanna M. Hamilton, Broad Communications


Hi. I’m Karen Zusi, science writer at the Broad Institute, and you’re listening to BioLogic, the logic behind the science: conversations with Broad researchers exploring what they do and why they do it. 

In this episode, we’re taking a look back at some of the history behind genome-wide association studies, and a look forward to see what researchers are still learning from them.

Genome-wide association studies, or GWAS, came onto the research scene roughly a decade ago. In these studies, scientists examine the genomes of people with and without a specific trait or disease, looking for small variations in their DNA that might correlate. 

Then, researchers can use those variations like guideposts to find the genes that might be involved — and investigate the biology behind a disease. This could open the door to better treatments, ones based on a disease’s molecular causes rather than its symptoms.

For the rundown on GWAS, BioLogic turned to Joel Hirschhorn.

Joel HirschhornJoel Hirschhorn

Joel is an institute member and co-director of the Metabolism Program at the Broad Institute, and professor of pediatrics and genetics at Harvard Medical School and Boston Children’s Hospital. He’s pursued genome-wide association studies since they first became possible, particularly for type 2 diabetes, obesity, and height. 

Over the last ten years, thousands of GWAS have resulted in genetic regions being linked to traits and diseases, but the method hasn’t always enjoyed unwavering support. Joel sat down with BioLogic to talk about how the research community has approached GWAS through the last decade.


Really the first kind of decently sized typical genome-wide association studies were in 2007. And all of a sudden it became actually possible to do genome-wide genotyping. And that was this amazing thing then because it really changed the game of what you could do.


Before large genome-wide association studies were possible, researchers often made educated guesses about what genes might be connected to a disease, and then studied just those bits of DNA. 

Compared to these “candidate gene” studies, GWAS represented a leap forward in the search for disease-associated DNA.


We knew already that candidate gene association studies were largely not successful. We did these exercises — I remember we were working on type 2 diabetes and obesity, and we actually did the exercise of making these long lists of candidate genes, saying “These are the genes we think we should test, because they are the genes that have been suggested by previous biology that they are going to be important for diabetes and obesity.”

When we then did the genome-wide association studies, and the first signals start to come out, essentially almost none of the genes that came out of the genome-wide association study had been on our list, because we were just so bad at guessing what the biology was. 

I think we were all surprised at how bad our candidate gene lists were. And it’s not so much that the biology was entirely wrong. It’s really, I think we didn’t grasp, or at least I didn’t grasp, how much of the genome we knew absolutely nothing about.

So if you’re gonna say “Well, this is all gonna be growth plate biology,” to pick height for example, even — height’s probably one of the cases where the previous biology is one of the best-defined and actually lined up almost the best with what’s come out of GWAS. And even there, most of the genes that we identify are not genes that people have said were important for the growth plate, even though when you look at them they’re actually expressed in the growth plate and even expressed in really interesting ways. 

And they’re almost — in fact, some of them then turn out to be genes for skeletal growth disorders. But nobody had pointed to them in any real way saying, “Oh, this is going to be an important gene.” And I think that’s been true even moreso for the diseases where it turns out we’ve known less about the biology going in than we thought we did.

So it really was absolutely essential to be able to take a comprehensive genome-wide approach to get at these variants.


Today, unbiased genome-wide association studies can include tens, or even hundreds, of thousands of people.

But ten years ago, the first genome-wide association studies were much more limited. And research groups around the world had very different perspectives on these early studies.


So we felt that we were lucky to have found anything at all, given the sample sizes that we were limited to at the time.

And we were really excited that — even though, for example, for diabetes, — that when we put everything together, there actually were several clear associations that emerged. And for us, that meant “Okay, this is going to work. We just need to do this on a larger scale, and we’re going to continue to make discoveries, and we’re going to learn more and more about the biology.”

There was a different camp that said, “See? You haven’t discovered anything. You got only, you know, a couple of things out of this, and they explain very little of the heritability, and so this is a complete failure. You haven’t discovered anything, you haven’t accounted for any heritability, there are so few loci, this is just a waste of time.”


But steadily, as technologies and methods improved, the amount of data collected went up.


And, in fact, it turned out that for pretty much every polygenic trait and every disease that was looked at, as you got more samples, and, surprisingly, in an almost linear way, you got more and more loci that you discovered.


Now, finding more regions in the genome that associated with disease didn’t always immediately translate into predicting heritability. Each hit in a GWAS screen usually only accounts for a tiny, tiny fraction of risk for the conditions that are being studied. 

But for Joel and his colleagues, GWAS was never about fully explaining heritability or predicting someone’s risk for developing disease. It was about finding new, unexpected clues to the biological pathways that determine a trait like height, or what goes awry in a cell to cause disease. The goal was a more complete understanding of the true biological roots.

As genome-wide association studies resulted in more and more data, the success brought a new round of critiques.


The opposite criticism was, “You’re gonna discover so much that you won’t be able to make any sense out of it. So if every gene in the genome is associated, how can that possibly teach us anything about biology?”

And that argument is used to say, “Well, why keep going, because all you’re gonna do is just sort of keep discovering more of these random noise genes and it’s not gonna clarify the picture any more?”


So, one of the main challenges for researchers, and a key next step that Joel noted, is to look at these regions of DNA highlighted by GWAS — which could include multiple genes, as well as noncoding sections of DNA — and really ask “Which regions might include the relevant DNA variation? Which variants are just along for the ride? What are the actual genes involved here? And how might they lead to dysfunctional proteins, cells, or tissues?”


And sometimes you get lucky — you stumble across something that’s instantly recognizable biology that you didn’t know about ahead of time and suggests immediate routes forward for potential therapeutic hypotheses.

But the more common outcome has been that you get a fair number of loci and you can clearly detect that these are not random darts being thrown at the genome. They’re enriched for particular cell types, for tissues, they’re enriched for biological pathways, and that all points us to what the biology is.

And more loci will help with that, because you get sort of more “votes” as to what the right biology is. And this idea of “there are more votes” really is an important one, and more loci actually do help you delineate the biology better.

So even though we’ve actually been doing GWAS now for almost a decade, and the sample sizes have gotten to be a hundred times as large as they started out in some cases, we’re still on an upward slope in terms of what we’re learning as we keep adding samples.

That will eventually plateau, I think, because eventually we will start hitting the same genes and the same pathways over and over again, and then it’ll be clear, “Yeah, yeah, we already know about that pathway, we already knew about that pathway.” 

There may be computational methods that — or functional tests or things like that — which help for the loci we already have, but we haven’t yet, I think, reached that point of diminishing returns.

And the point that I’m describing is very different than accounting for all of the heritability. So people make this mistake all the time, where they say, “You’re not done” or, you know, “You’re not even close to succeeding, because you haven’t accounted for all the heritability.” 

And if our only goal was to try to predict exactly how obese somebody was going to be or how tall they were going to be or what their likelihood of getting diabetes or schizophrenia was gonna be, yeah, then accounting for the heritability is the key metric. And it’s an important one because it gives you an idea of how far along you are, and it does tell you a little bit about prediction.

But really, if you’re in it for the biology, you’re gonna finish learning what you can learn about the biology long before you finish accounting for all the heritability.


I asked Joel if there were any research areas where scientists might be coming close to this “point of diminishing returns” from GWAS.


I don’t know that there’s any one where, you know, we can say, “We’re so happy with the crisp and complete picture of the biology of the disease that we can just stop.” 

We’re trying to drive a couple of phenotypes in the quantitative trait realm to completion, or to biological completion. So we’re trying to do that for height, measures of obesity. Probably lipids is also, you know, pretty far down that road. There’s a huge number of loci, they cluster a lot into biological pathways. But even there, out of some of the loci that have emerged, there are clearly new therapeutic opportunities.

So I don’t think we’re there yet, but, you know, we’re hoping to get closer to that point. 


But even when researchers think they’ve found relevant genes in GWAS results, that alone isn’t always enough to gain insight into the biology. Scientists are still figuring out what each gene in the human genome is responsible for.


It’s really hard, when you come across a gene that nobody’s worked on, to actually gain traction and say, “Now we understand what this gene does.” So I think that’s been the surprise and the challenge. I mean, I guess the good news is that’ll keep people busy figuring out this biology for a while.

In an ideal world, it would be a multidisciplinary team that would kind of be there from the beginning. Where there’s the people who are doing the genetics and really understand the nitty-gritty of the genetics to say, “How certain are you about this gene, or that gene, or this locus, or that locus?” and then people who could then — who understood the right system in which you could study those genes, or developing the right system. And then people who could take the insights from that system and develop them into lead compounds and that sort of thing.

I’d say the difficulty is that, in many cases, the people who are doing the genetics are really excited about the genetics and are quite confident that these are really important insights. But the people who are set up to study particular genes, if they’ve never heard of a gene other than from the genetics, they may not be so enthusiastic to take that up and put that into their system.

So, oftentimes what happens is the people who are really excited and passionate about it are the people who end up having to drive it, and so they end up having to acquire a whole new set of skillsets — or at least really convince a collaborator who has those skillsets. So that linkage, I think, is also part of the challenge of taking the next step towards the biology. 


And it can be difficult to secure funding. But as more actionable results surface from GWAS, it gets easier.


Pharmaceutical companies are having some interest in genetic-based therapies. So, there are a couple of papers that came out that showed that drugs are much more successful when they target something that has support from a genetics study than when they don’t. So that’s gotten pharma’s attention. So there’s, I think, some support from there. 

But I think that, in part because there was this kind of pushback throughout against GWAS, it’s not quite as straightforward to get the funding to kind of pursue the leads from GWAS. Especially when the — you know, you have a gene where you don’t know anything about it and pretty much you say, “Well, we think from the genetics, this is an important gene,” to then convince a study section to say “Well, okay, we should go after this,” sometimes can be challenging.

So, I think just the more successful examples of that there are, the easier that will become.

There’s still plenty of discovery to do, but I think that we’re sort of taking a little bit of a rounding-the-corner of trying to now understand the biology in ways that are rich enough to really let us get some traction on therapeutic hypothesis. And that’s really the challenge of the next ten years.


Genome-wide association studies have already approached actionable biology for some autoimmune diseases, diabetes, schizophrenia, and others. 

Further initiatives currently underway at the Broad Institute will also help make sense of GWAS hits. Researchers are working to demystify cell regulatory networks, map gene expression patterns in different body tissues, and more. Genome-wide association studies are just one tool to uncover the biology that underlies disease and to shed light on new diagnostic and therapeutic opportunities.

You can read more about Joel’s work in particular, and explore further content related to the past decade of genome-wide association studies, on broadinstitute.org.

You can also find more episodes of BioLogic, with transcripts, on broadinstitute.org, and through SoundCloud, iTunes, Pocket Casts, and other podcast distributors. 

For the Broad, this is Karen Zusi. Thanks for listening!