How new data from the All of Us Research Program will help address the diversity problem in genomics

Broad scientists discuss how the newly released data, which includes nearly 100,000 whole genome sequences, will drive more equitable research and healthcare.

Alicia Martin, Alham Saadat, and Konrad Karczewski
Alicia Martin, Alham Saadat, and Konrad Karczewski

The field of genomics has had a representation problem. More than 80 percent of people who have participated in large genomics studies are of European descent. The All of Us Research Program, led by the National Institutes of Health, aims to help solve that problem by partnering with one million participants or more from communities across the United States that have not historically been represented in genomics research.

The program began enrolling participants in 2018 and last week released its first genomic dataset: whole genome sequences from nearly 100,000 participants along with medical record, survey, and other types of data. About 50 percent of the participants are from racial and ethnic minorities, and 80 percent are from communities that are underrepresented in research. The program’s ultimate goal is to generate a large, diverse dataset that will help accelerate research on health and disease, reduce health disparities, and drive breakthroughs in precision medicine.

The Broad Institute of MIT and Harvard is part of All of Us’s consortium of partners. The Broad’s Genomics Platform began sequencing human genomes from the project in July 2020 and has generated more than half of the genomic data. A team led by Lee Lichtenstein, associate director of computational methods in the Broad’s Data Sciences Platform, built data-analysis tools and helped to process and ensure quality of the data. In addition, Niall Lennon, translational genomics senior director and an institute scientist, co-led a group of researchers from the Genomics and Data Sciences Platforms and other All of Us consortium partners who spent 18 months establishing the safety and validity of the genomic testing process, to the satisfaction of the FDA, so that participants of the project will soon be able to learn of genetic findings that may be important to their health and risk of developing certain diseases.

We spoke with Broad scientists Alicia Martin, an associate member and assistant professor at Massachusetts General Hospital; Konrad Karczewski, a computational biologist and group leader; and Alham Saadat, associate director of scientific equity, on how they are planning to use the dataset, what makes it so valuable, and what they hope to see others do with this massive resource.

Why is it important to have diversity in biobanks like the All of Us resource?

Alicia Martin.
Alicia Martin.

Alicia Martin: My research revolves around the idea of increasing diversity in genomics and increasing equity in scientific research involving genomics. Thus far, that has been a tricky task because our datasets are so non-representative.

All of Us represents a big shift in what will be available from under-represented populations. My lab develops tools for diverse populations, for example, polygenic scores and preventative medicine models. We also study how genomics fits into clinical models that are currently used and how we can incorporate other types of environmental data into these models, which could help better predict disease risk and discover what's driving health disparities. Genetics won’t give us the complete story, so putting it together with lifestyle, medical, and demographic information is crucial to understanding the comprehensive picture of better disease risk prediction models and what's driving differences in prevalence across different populations.

Other projects such as the NeuroGAP project have put resources and effort into diversifying genomics, but these are often focused on a subset of human disease. All of Us is instead a massive, general effort aimed at diversity across the whole breadth of human population. It’s a phenomenal resource.

Alham Saadat.
Alham Saadat.

Alham Saadat: All of Us includes participants from a range of demographic backgrounds — ancestry, ethnicity, gender identity, socioeconomic status, and country of birth —with a targeted effort to capture populations that have been previously under-represented in large biobanks. By collecting a wealth of data on social and environmental factors as part of the surveys, All of Us will be a rich resource that will allow the scientific community to include other types of non-genomic data into our work to better understand human health. In addition, the program is a huge step forward for ensuring equitable, secure and transparent data usage, allowing scientists from all backgrounds to use it to make new discoveries and the general public to see what studies the data will be used in.

What kinds of studies are possible with the All of Us dataset?

Konrad Karczewski.
Konrad Karczewski.

Konrad Karczewski: Rare genetic variants are more likely to be damaging to human health, and large datasets give us more opportunities to uncover these variants in the population. Our group has built tools to uncover rare variants that influence human disease in large datasets like the UK Biobank, but more diverse datasets would provide greater opportunities. All of Us is incredibly enabling because of its size — it is at least twice as large as the UK Biobank — and the breadth of populations it includes, with more chances to uncover rarer DNA changes. In addition, it includes whole genome data, not just the exomes (the protein-coding portions). Changes to the gene-encoding portions of the genome are more likely to have an impact on health, but the whole genome data in All of Us will allow us to search the non-coding regions for regulatory changes that alter the activity of other genes, as well as giving us a more unbiased look across the exome itself.

Saadat: We know that we can use genetic data to estimate disease risk and outcomes, but we also know that there are many other causal drivers that we are missing. Incorporating data from under-represented groups along with social and environmental data , such as socioeconomic status, educational attainment, healthcare access, geographical location will help us better understand gene-environment interactions, which will be critical for a more complete understanding of human health. All of Us will not only help expand our understanding of basic human biology and health, but it will allow us to better understand health disparities, which we know are driven primarily by social and environmental factors

What’s your vision for how scientists might use the All of Us resource?

Martin: Currently, some clinical models used in doctor’s offices require doctors to make rough corrections on, say, a blood test result, based on a patient’s racial or ethnic background, because the model isn’t accurate for all populations. Instead, we want to use clinical tools that work for everyone. So we want to understand why health and outcomes differ, rather than just roughly correcting for them. Alham and I are working to identify resources that can get researchers invested in this new approach.

Saadat: Since joining the Broad’s Office of Inclusion, Diversity, Equity, and Allyship in December 2021, I’ve been listening to groups across the Broad to learn what they’re doing to make biomedical science more equitable, what they hope to be doing in this space, and what things would help them move the work forward. Part of these conversations have been to learn what the barriers are to using social and environmental data in their work and think about support and services that would allow researchers to expand in this space. We aim to build a community that’s able to work together and iterate on best practices across many disease types and approaches.

Karczewski: We are getting better at finding high-impact coding variants, but we still want to know what their effects are on the cells and tissues. Identifying associations for rare variants in a project like All of Us gets us closer to identifying the actual genetic changes that cause disease. I’d like to share this work with the scientific community and enable others to further study these variants and explore potential therapeutic avenues targeting the genes and pathways we highlight.

How far does All of Us move us toward solving the genomics field’s diversity problem?

I’m optimistic that it will be a transformative resource for the field. It’s going to democratize genomics for underrepresented populations in ways that other resources haven’t been able to.

Martin: The makeup of genomics studies isn’t reflective of the diversity we see in human populations around the globe. Over the past decade or so, our progress towards diversity has not really improved so much, even after places like the National Institutes of Health have called and pushed for more diversity in genomic research. Even as genome-wide association study sizes have increased, the number of participants from underrepresented ancestry groups has remained flat. The All of Us resource is going to be powerful in leading to an influx from some of these ancestry groups and underrepresented populations. I’m optimistic that it will be a transformative resource for the field. It’s going to democratize genomics for underrepresented populations in ways that other resources haven’t been able to.

Saadat: As a US-based project, All of Us may still miss some populations that aren’t prevalent in this country, such as East African populations and many others. So there’s still a gap that we as a field need to address, but this project is a critical step in the right direction.