Broad Institute and Microsoft collaborate to help accelerate disease research with scalable analytical tools

The collaboration will use the latest advances in machine learning to create automated methods and tools for life science researchers.

Credit: toodtuphoto/

The Broad Institute of MIT and Harvard and Microsoft have launched a new collaboration that aims to help life science researchers and clinicians with the next generation of robust analytical methods and tools that are highly scalable, easily automated, and leverage cutting-edge machine learning techniques.

The new collaboration is designed to meet growing challenges and opportunities in biology and medicine. Life scientists are generating more biomedical data than ever and yet are struggling to scale existing methods and tools to analyze such large amounts of data. Most existing biomedical research tools were designed for small-scale work.

However, rapid developments in technology over the last decade are now enabling researchers to deeply investigate the molecular mechanisms of life from fresh angles. New advances in machine learning and data science offer exciting opportunities for researchers to gain more insight from a greater variety of large datasets. This will improve our understanding of the origins of disease and lead to the development of better diagnostics and treatments.

"We’re excited to join forces with our colleagues at Microsoft who have extensive experience in data science, computer science and engineering," said Clare Bernard, head of the Broad Institute’s Data Sciences Platform (DSP). "DSP's mission is to help empower biomedical researchers to deliver on the promise of precision medicine, and this new collaboration will be critical in helping us drive that forward."

Developing tools and pipelines with a rare disease genomics focus

One of the first projects to kick off the collaboration will focus on developing tools and pipelines to improve the identification and interpretation of potentially disease-causing genetic variants in the genomes of families affected by rare genetic disorders.

Heidi Rehm, co-director of the Program in Medical and Population Genetics at the Broad Institute, and chief genomics officer in the Department of Medicine at Massachusetts General Hospital, said that the current diagnostic tools used in rare disease genomics are not able to meet the needs of many patients and their families.

"Variant interpretation in the genomic diagnosis of rare disease is currently a highly manual process, often requiring hours of labor by skilled curators," said Rehm. "Although clinical laboratories and researchers have been willing to invest time in their primary analyses of genomic data, there are insufficient resources for reanalysis, even though knowledge bases in genomics change rapidly. The result is that families that cannot be diagnosed during their first clinical analysis may miss out on a diagnosis entirely."

To address these shortcomings, Rehm is teaming up with Michael Talkowski, a Broad institute member and faculty member of the Center for Genomic Medicine at Massachusetts General Hospital, and Daniel MacArthur, a former Broad institute member and now director of the Center for Population Genomics at the Garvan Institute of Medical Research and the Murdoch Children’s Research Institute in Australia.

Together with teams from the Broad Institute's Data Sciences Platform and Microsoft, Rehm, Talkowski, and MacArthur will leverage scalable pipelines and machine learning approaches implemented on the Azure cloud. In the initial phase of the project, the life science researchers will develop novel methods and tools to discover genomic variation, characterize and annotate the functional consequences of this variation, and rapidly interpret this variation in rare disease cases from the US, Australia and around the world. Their aim is to improve the efficiency of routine rare disease diagnosis in families, enable these new tools and pipelines to be deployable in clinical diagnostic labs in the US and Australia, and make these approaches accessible to the global community.

"Our goal is to dramatically speed up the analysis process so that the global community can provide cost-effective genomic diagnoses for many more rare disease families," said Rehm.

Maximizing impact

Other projects in the collaboration between the Broad Institute and Microsoft will focus on developing the capability for streamlined, secure population health research, and enabling biobanks by advancing new sequencing technology and workflows. To maximize the impact of the new tools developed through the collaboration on human health, Broad and Microsoft will make them available to the life sciences community through platforms such as Terra, a cloud-based biomedical research platform co-developed by the Broad Institute, Microsoft, and Verily.

"We are excited to bring the power of our Azure AI platform into this research collaboration with the Broad Institute. At the frontier of human health, biology problems are transitioning to computing problems," said Scott Saponas, Senior Director of Microsoft Health Futures. "Developing tools to address the unmet computational needs arising from this transition will enable us to make significant advances in science and medicine which will deliver impact in clinical settings; starting with rare disease diagnostics."