A growing network of 24 biobanks around the world is enabling better powered studies of disease in more diverse populations.
Global initiative shows how bringing biobanks together can yield new genetic insights
In 2019, scientists at the Broad Institute of MIT and Harvard, together with colleagues at the University of Michigan and researchers around the world, founded the Global Biobank Meta-analysis Initiative (GBMI) to connect international biobanks in order to enable larger, more powerful studies of genetic data with more diversity in genetic ancestry, and thereby increase the chances of finding new genetic variants linked to disease. The initiative has since grown to include 24 biobanks across five continents and contains data from more than 2.2 million people.
Now the collaboration has published its first set of papers. In Cell Genomics, the consortium shared pilot analyses of summary statistics from multiple genome-wide association studies (GWAS) by member biobanks. These studies analyzed genetic, clinical and other data to look for genetic variants associated with a particular trait or disease. The GBMI team found that GWAS using data from many biobanks can be integrated despite differences in data collection and categorization. The scale and diversity enabled identification of several genetic variants that had a stronger association with disease in individuals grouped by sex or ancestry.
The team says GBMI lays a foundation for more high-powered collaborative studies of disease, new investigations into understudied diseases, and improved representation of undersampled ancestries in GWAS analyses. It could also help biobanks with fewer resources gain access to more data and analytical expertise.
“We believe that GBMI really shows how much biobanks can gain by collaborating with each other to power the genetic association findings for human diseases — including those that have been understudied by previous GWAS,” said Wei Zhou, first author of GBMI’s flagship paper published today and an associated scientist in the Stanley Center for Psychiatric Research at the Broad. “GBMI creates opportunities for biobanks to cross-validate new findings and facilitate follow-up studies such as predicting disease risk and identifying potential drug targets.”
“The construction of this GBMI resource represents a great starting point for how, as a worldwide community, we can together leverage genetic data to better understand disease,” said Mark Daly, co-corresponding author on the study and an institute member at the Broad.
In addition to describing their analysis pipeline and principles of collaboration, the team reported initial findings from a meta-analysis of 13 diseases and one medical procedure — ranging from common conditions such as asthma to less prevalent diseases such as cardiomyopathy. These results will be detailed in a suite of upcoming publications.
Additional GBMI papers published this week include studies on genomics-driven drug discovery, multi-ancestry transcriptome-wide and proteome-wide association studies, a genome-wide association study of asthma, and cohort descriptions for several member biobanks, including Taiwan Biobank, the National Biobank of Korea, and the HUNT biobank in Norway.
“People in the genetics community have become very excited by what GMBI has done together, and more and more researchers want to get involved,” said Sinéad Chapman, director of global genetics project management at the Broad and an operations lead in GBMI. Chapman says that GBMI has been a good model for other initiatives to follow. The COVID-19 Host Genetics Initiative, for example, adapted the same principles of collaboration and used a similar analytical pipeline to discover genetic factors underlying COVID-19 infection and severe disease.
“This initiative really developed out of people from different biobank subgroups coming together and seeing the importance of team science and the power in numbers,” Chapman said.
Power in numbers
Zhou says the team was inspired to found GBMI after seeing researchers start using biobanks, such as the UK Biobank — the largest biobank in the world — for large-scale analyses. “People started facing challenges associated with running GWAS in these large biobanks for hundreds and thousands of complex diseases that can be curated from electronic health records,” she said.
To address these challenges, Zhou, Willer, and colleagues at the University of Michigan developed a statistical method to help analyze these datasets more efficiently and with fewer errors, and the GBMI team began helping biobanks connect with each other to run even larger analyses.
“Having biobank teams work together results in more equitable knowledge sharing, vastly improved scientific reliability, and focuses scientific brains with diverse training and new ideas towards working on the most difficult, impactful problems together,” said Cristen Willer, a co-corresponding author on the study, currently at Regeneron, and a professor at the University of Michigan when the study began. “There’s no question this global team approach results in better science and helps expedite bringing genetics findings into clinics.”
“Many biobanks keep recruiting samples and growing, but they may have challenges running GWAS by themselves,” Zhou said. “We thought that if we could help biobanks make an analysis plan, guide them through quality control and how to use these methods, it would really help them work together.”
The team harmonized data quality and definitions of traits and diseases including asthma, stroke, gout, appendicitis, and uterine cancer. They then pooled GWAS across the biobanks to integrate analyses. The meta-analyses confirmed more than 300 loci, or locations in the genome, previously shown to be important to disease, and identified nearly 200 new loci linked to disease. The team found that most genetic associations were consistent across biobanks, reinforcing the reliability of the approach.
The researchers also ran GWAS to look for sex- or ancestry-specific associations, which are often difficult to conduct within a single biobank that has a limited number of samples. The meta-analyses revealed eight loci that had stronger associations in males or females for diseases including asthma and gout. The team also found 16 loci that had stronger associations in people of certain ancestries.
Armed with this information, researchers can now prioritize the study of certain genetic variants to better understand the biological mechanisms underlying disease in specific populations. The GBMI team hopes that at the same time, the initiative will continue to grow and motivate new biobank collaborations.
“Looking forward, biobanks will continue to grow and develop across the world, underscoring the importance of efforts like GBMI to maximize the benefit of these cohorts,” said Benjamin Neale, co-corresponding author of the effort and co-director of the Program in Medical and Population Genetics at the Broad. “By working together, we can advance our understanding of the genetic contributions to common human disease risk and build a more inclusive human genetics community.”
“We're really hopeful that the platform we've set up will encourage more people to share data and collaborate amongst themselves and see how this style of research can be done,” Chapman said. “We’ve seen a lot of junior researchers step forward and lead their own analyses, and that’s really encouraging.”