Broad expands COVID-19 viral sequencing efforts for variant surveillance in the Northeast

In partnership with the CDC, Broad scientists are analyzing SARS-CoV-2 genomes to look for known variants of concern, detect emerging ones, and support public health needs

Melissa Rudy in the Broad's Viral Genomics Group processes samples for viral sequencing
Credit: Scott Sassone, Broad Communications
Melissa Rudy in the Broad's Viral Genomics Group processes samples for viral sequencing

Over the past year, the Broad Institute of MIT and Harvard has processed more than 15 million COVID-19 tests from nursing homes, colleges, health care facilities, homeless shelters, K-12 schools, and other organizations. Meanwhile, scientists in the Broad’s Viral Genomics group have worked with local clinical and public health partners to analyze the viral genomes in dozens of COVID-19 positive samples each week, tracking the virus’s spread and helping their collaborators to detect and manage outbreaks.

Now, in partnership with the Centers for Disease Control and Prevention, the Broad is scaling up its COVID-19 viral sequencing efforts to enable genomic surveillance of the virus in our communities. The Broad’s Genomics Platform and the Viral Genomics Group are together sequencing and analyzing viral material in COVID-19 positive patient samples to monitor for known, concerning variants of the virus in the Commonwealth of Massachusetts and New England, such as ones that were first detected in the UK, Brazil, and South Africa. The scientists are also looking for emerging variants that raise the need for further investigation.

The Broad is currently sequencing several hundred positive samples per week from the institute’s COVID-19 diagnostic facility and external partner sources, and will ramp up to 5,000 samples per week by later this spring. (December 3, 2021 update: The facility is now processing up to 10,000 samples each week. See Broad's COVID-19 Genomic Surveillance dashboard, which is updated weekly.)

The Broad’s Data Sciences Platform is supporting this effort with bioinformatics tools that enable rapid data analysis and sharing with the CDC and other partners. The Broad will continue to share SARS-CoV-2 sequence data with state public health departments, the CDC, and the greater scientific community to assist with pandemic response.

As the virus spreads, it naturally acquires new mutations; some of these could make it more infectious, more dangerous to the patient, or less preventable through vaccination. By monitoring differences in the virus’s genome sequence between samples and over time, scientists can track how it is changing and identify new opportunities to control the virus’s spread, for example, by modifying a vaccine so that it protects against new, emerging variants.

“The data we are generating will give us a much better picture of how this virus is advancing through our community, and will provide insights to detect and slow the spread of concerning variants,” said Bronwyn MacInnis, director of pathogen genomic surveillance in the Broad Institute’s Infectious Disease and Microbiome Program and a co-leader in the Viral Genomics Group.

“It’s clear that more genomic sequencing of the virus is needed to monitor COVID-19 in the community,” said Stacey Gabriel, director of the Genomics Platform. “We’ve been fortunate to be able to bring our state-of-the-art genomics facility and expertise in high-throughput sequencing and automation to bear on this urgent public health need.”


Since the beginning of the pandemic, the Viral Genomics Group, led by institute member and Harvard professor Pardis Sabeti, has been sequencing hundreds of COVID-19 positive samples from Massachusetts General Hospital, University of Massachusetts Medical School, the Massachusetts and Rhode Island public health departments, and other partners to learn how the virus has spread through our communities and was accelerated by superspreading events. “This genomic data is critical for our state and national public health partners, whether it’s for assisting with outbreak investigations or monitoring variants of concern,” said Danny Park, the computational lead in the Viral Genomics Group. They designed protocols for sequencing the virus and automated computational pipelines for assembling the sequenced bits into full genomes.

Samantha McGovern in the Viral Genomics Group processes COVID-19 positive diagnostic samples from local partners for sequencing, to support public health needs. Credit: Scott Sassone, Broad Communications.


The team also worked with the Data Sciences Platform to apply workflows in Terra — a secure, scalable, open-source cloud computing platform developed by the Broad, Verily, and Microsoft that allows biomedical researchers to access, analyze, and share data — to study viral mutations, evolution, and transmission. Using their approach last year, the researchers uncovered how the virus was first introduced to greater Boston and how it can spread in a hard-hit urban area.

"Because of our lab's existing expertise and workflows in sequencing the genomes of pathogens, we were able to use the early data on SARS-CoV-2 to quickly establish a sequencing process in our lab and make discoveries about the virus's spread in our region," said Gordon Adams, a senior research associate in the Infectious Disease and Microbiome Program.

In late 2020, as scientists were identifying new SARS-CoV-2 variants of concern globally and in New England, MacInnis and others at the Broad recognized the need to apply the institute’s massive high-throughput sequencing capabilities for broader community surveillance. “Because of the sheer number of COVID-19 infections, the virus has had lots of opportunities to acquire new mutations that may allow it to transmit better or to evade our immune response, which we can study with viral genomics,” said Katie Siddle, a lead scientist in the Viral Genomics Group.

With the support from institute leadership, the scientists set a goal to not only scale up the existing sequencing capacity in the Viral Genomics Group by 5- to 10-fold, but to also establish the Broad’s first large-scale viral sequencing program in the Genomics Platform, using some of the groundwork already laid by the Viral Genomics Group. With support from the CDC, the program launched in late March 2021.

Andrea Borges processes patient samples for COVID-19 testing in the Broad’s diagnostics facility in the Genomics Platform. A subset of COVID-19-positive samples are “cherry-picked” for viral sequencing. Credit Scott Sassone, Broad Communications.

To scale up viral sequencing, the Genomics Platform’s laboratory automation team, led by Scott Anderson, relied upon some of the innovations they’d implemented with the platform’s development team to build the large-scale diagnostic lab earlier in the pandemic. “Expanding upon the knowledge we gained in scaling diagnostic testing, we were able to quickly develop the new COVID-19 sequencing process in a flexible, high-throughput manner,” said Anderson.

Senior bioautomation engineer Matthew Lee designed a new automated procedure to “cherry-pick” a subset of positive samples from the Broad’s testing facility for sequencing. Because only 1 to 2 percent of samples are currently testing positive, roughly one thousand 96-well plates of samples must be processed and cherry-picked in order to reach the targeted 1,000 genomes per day. “This is no small task and automation is key to running it smoothly and efficiently,” said Anderson.

Automated liquid handlers in the Genomics Platform “cherry-pick” COVID-19-positive diagnostic samples for sequencing. Credit Scott Sassone, Broad Communications.

Genomics Platform researchers led by Brendan Blumenstiel and Matt DeFelice on the platform’s development team adapted existing sample prep and analysis protocols for SARS-CoV-2 to run on the sequencing center’s automated, high-throughput machinery.

“Groups around the globe, including the Sabeti lab, have built and openly shared methods for targeting and sequencing the SARS-CoV-2 genome. So when Broad decided to launch a large-scale sequencing effort in the Genomics Platform, we could quickly settle on and validate a scalable method,” said Blumenstiel. “Because the folks involved have been working together on similar challenges for more than a decade, we were able to get this impactful program off the ground swiftly."


Viral sequence data coming off the Genomics Platform’s high-throughput process are sent to the Terra data repository, to be rapidly processed, analyzed, and packaged for sharing with CDC, state and local departments of health, and public scientific databases including the National Center for Biotechnology Information’s GenBank and Sequence Read Archive, along with GISAID.

To create this system, the Data Sciences Platform and Viral Genomics Group worked together to scale the Terra workflows for data processing, analysis, and quality control that were developed over the past year, with the goal of automating as much of the workflow as possible in the coming weeks. “Our job is to ensure that the data is handled consistently so it can be made available without delay to be most helpful for public health in real time,” said Christine Loreth, senior alliance manager for the Data Sciences Platform who worked with the Viral Genomics Group to integrate processing and analysis workflows into the Terra platform. Loreth is also leading efforts within the Data Sciences Platform, and in collaboration with the Viral Genomics Group, CDC, and others, to support adoption of Terra as a general purpose platform for pathogen genomics computation for U.S. public health labs that are performing SARS-CoV-2 sequencing.

Analytical tools in Terra can identify SARS-CoV-2 mutations and variation in large amounts of sequence data, and also indicate whether a viral sample is a known variant of concern. Once shared with clinical and public health partners, that data can help guide public health decision-making and inform efforts to develop vaccines and therapeutics. The Viral Genomics Group and their collaborators are also studying these data to identify emerging variants and examine the genetic epidemiology of the virus.

Any user can explore these data using Broad’s web-based SARS-CoV-2 visualization tool, Auspice.

As the Genomics Platform increases its viral sequencing capacity for large scale surveillance, the Viral Genomics Group will continue to sequence more targeted sample sets with particular features of interest, like possible vaccine escapes (when fully vaccinated individuals test positive for COVID-19) and long-term infections, or cluster investigations in collaboration with their partners. In addition, they’ll continue conducting their own deep explorations of the virus’s evolution and spread in the community.

Broad scientists are also working with the CDC to help coalesce a national network of sequencing centers that can offer genomic surveillance for communities across the U.S and provide a broader view of the evolution and spread of SARS-CoV-2.

“The U.K. did a great job staying on top of this with widespread genomic surveillance," said MacInnis. "It’s so important that we in the U.S. do the same and move towards a national picture of how these variants are evolving."