How SARS-CoV-2 spread across greater Boston during pandemic’s first wave

New findings demonstrate the power of genomic data to help understand and track a deadly virus’s evolution and spread

New data from the Broad Institute of MIT and Harvard, Massachusetts General Hospital, the Massachusetts Department of Public Health, and the Boston Healthcare for the Homeless Program describes how the SARS-CoV-2 virus entered the Boston area, and how certain events shaped the trajectory of the epidemic in the region. These data are described in a manuscript on medRxiv, and in a visual narrative overview on

Today we are releasing 441 high-quality SARS-CoV-2 genomes which, added to our initial release in June, bring the total from the effort to 772 from the Boston area so far, spanning the period from late January to early May. Our dataset includes nearly all cases from early in the epidemic and dense sampling across the first wave in Massachusetts, providing a deep view of the emergence and expansion of SARS-CoV-2 in one of the hardest-hit regions in the US. 

Two features of the data are particularly striking. The first is the sheer number of times the virus entered Massachusetts over a period of some three months, from the beginning of February 2020 to early May. We estimate 80 separate introductions, mostly from elsewhere in the US and from Europe, but including sources on four continents. The exact number and sources of the introductions are uncertain (we lack good genetic data about exactly what viral variants were present in many parts of the world), but the introductions were clearly numerous. The rate of new introductions declined during March as travel decreased and various control measures took effect.

The second striking feature is how variable the effect of a single infection can be on the trajectory of an epidemic. The importance of ‘superspreading events’ in COVID-19, in which a single person infects a disproportionately large number of others, has been widely reported. This study, which details two superspreading events in Massachusetts, underscores this, but shows how variable the effect can be. 

One of these superspreading events took place within a skilled nursing facility. All of the residents and most of the staff were tested as a precaution prior to a planned relocation. Ultimately 85 percent of the residents and 37 percent of the staff tested positive. Genomic analysis of these infections showed that even though COVID-19 was not suspected, the virus had entered this community on three separate occasions — but only one of those introductions was responsible for more than 90 percent of the infections. The limited genetic diversity among these cases indicated a very rapid spread in the facility. 

The other superspreading event was associated with an international business conference in February. In that case, there is genetic evidence that a single person brought the virus (which had probably recently arrived from Europe) to the event. 

For both of these super-spreading events, a single person infected dozens of others, probably over a few days.

But the fallout of these two events was strikingly different. 

The outbreak in the nursing facility was devastating for those involved, but it occurred in a fairly isolated population around the beginning of April, after awareness of COVID-19 was high and precautions were in place. As a result, it caused little transmission outside the facility. The outbreak in the conference, by contrast, occurred in a highly mobile population in late February and spilled out into the larger community, accounting for at least 20 percent and as much as 40 percent of the later cases in our dataset. Using genetic data from other researchers, we can also track the subsequent spread of the virus to several other US states and to places as far afield as Slovakia, Sweden, Singapore, and Australia.

Elsewhere in the study, we found that SARS-CoV-2 — including viruses descended from the conference outbreak — entered Boston-area homeless populations multiple times, with rapid and extensive transmission within shelters. We also investigated two putative hospital clusters and demonstrated how near-real time sequencing can inform infection control practice. 

Our findings provide concrete examples of how transmission can expand beyond the initially impacted individuals and infect apparently unconnected locations, populations, and events. In addition to providing insight into the ongoing emergency in the Boston area, it demonstrates the power of genomic data to help understand and track the evolution and spread of SARS-CoV-2, which is particularly relevant as we move toward reopening schools and communities. The Broad team has been sharing the sequence data and insights with our clinical and public health partners in real time. 

We have also made the data and analysis workflows publicly available on the Broad Institute’s Terra platform, a secure, open-source cloud environment for storing, analyzing, and sharing (with tight permission control) genomic and other biomedical data. We are working with Broad’s Data Sciences Platform to adapt Terra for viral genomic data and genomic surveillance applications, to support and accelerate the use of this approach by public health researchers and practitioners around the world and help them respond to COVID-19 and future public health emergencies. The dataset and analysis workflows used here can be found here. The genomic data are also being shared openly on NCBI GenBank.