Q&A: How Terra became a backbone of public health pathogen surveillance

Bronwyn MacInnis and Daniel Park reflect on how Terra, a scalable and secure open-source platform for biomedical data management and analysis, is making genomics accessible to public health labs.

Graphic displaying Terra logo

When the COVID-19 pandemic broke out, genomic sequencing of pathogens was only just being adopted by many state and local public health labs across the US. But to detect new variants and other important changes in the SARS-CoV-2 genome, labs needed tools for secure, scalable, robust genome data management and analysis right away.

As it turned out, software engineering teams at the Broad Institute had already created a solution — Terra, an open, secure, and collaborative environment created in a collaboration between Broad, Verily, and Microsoft, where users can analyze and share biomedical and public health data.

Today, about a third of Broad’s Terra users are from the public health workforce; in the US, 76 state and local public health labs use the platform for preventing and monitoring infectious disease spread in their communities. Terra’s impact has now gone far beyond COVID-19, and the platform is routinely used for public health genomic surveillance of pathogens ranging from mpox viruses to malaria parasites, as well as outbreak response.

For Bronwyn MacInnis, director of pathogen genomic surveillance in the Broad’s Infectious Disease and Microbiome Program, and Daniel Park, senior group leader for viral computational genomics at the Broad, Terra has been integral to securely handling large amounts of data, both in their own labs and with public health partners both in the US and abroad. In this Q&A, we spoke to MacInnis and Park about the features that make Terra a useful tool in public health contexts as well as how it became indispensable during COVID-19 and continues to impact research today. 

What makes Terra so effective for the public health workforce?

Bronwyn MacInnis
Bronwyn MacInnis

BM: Even before the pandemic, we could see the potential of Terra for helping to build capacity for pathogen genomic data analysis in public health, where the highly skilled workforce needed to do genomic data analysis was lacking, especially before COVID. This was true in the US and all the more in the Global South, where the burden of infectious disease is the highest. There are several features of Terra that made it ideal for this type of work. 

One is the point-and-click interface of Terra workflows. That meant that people who aren't coders — local laboratorians and epidemiologists, people who understand infectious diseases best — could analyze their data. Historically, people who lack the technical skills are boxed out of doing that type of work. Getting a tool in their hands that enabled them to analyze their own data was our vision early on, and Terra offered this; samples and data are organized in Excel-like spreadsheets, making the platform intuitive to interact with. 

Also, when you use Terra everything is documented and tracked; the provenance of data and the history of what happens to it are captured. That really matters in public health, when you need to do routine work and get a reliable, reproducible answer every time, and you need to be able to share data, but in a way that allows you to track how it’s used.

Finally, Terra allowed different users and organizations to share data in a way that was under their control. They could share exactly the data they wanted when they wanted, with who they wanted. For example, local departments of public health could easily interact with state departments, and public health communities in different countries could also easily interact. 

All of these features spoke to the needs of this community. And they still do.

How did public health labs in the US first start using Terra?

Danny Park
Credit: Jules Ko Photography 2012
Danny Park

DP: Before Terra, only a handful of state public health labs were analyzing their own genomic data. Most didn't really have a way to do large-scale genomic analysis well. Many relied on the CDC, a neighboring state lab, or an academic partner to assist or perform analyses for them — except for the handful of well-resourced state labs that could hire the right people and build out the right infrastructure, and knew how to use it well. But most state and local public health labs in the US didn’t have this capability.

A real inflection point was around COVID when there was a sudden push for every lab to be able to do analyses they didn't have the infrastructure for, on a whole new scale. Theiagen, a public health consulting group, was an important part of onboarding this “long tail” of smaller public health labs. They had independently evaluated a lot of different options and ultimately landed on Terra as the only platform that offered a specimen-centric way of tracking data and results. They were able to get labs up and running with Terra within an hour, so these smaller public health labs could spend what little staffing they had on doing less of the informatic grunt work and more of the essential analytics during a pandemic period where their time was the most limited resource.

BM: During COVID, there was an exponential increase in the need to analyze pathogen genomic data in public health settings quickly, locally, and often without deep bioinformatics capacity. Terra was one of the few systems that was available and ready to meet the moment. The fact that it is open access, easy to get up and running on, and free to use certainly helped a lot. After using it for COVID, most of these labs used Terra for the next bug — for RSV, the flu, mpox — because they realized that they can do not only sequencing but also analysis on their own, rather than having to ship off data and then wait for the results and interpretation to come back.

Is data security in Terra an important feature for public health users?

DP: These labs are all part of state or local governments, and as such, security and privacy are seen as a non-negotiable prerequisite for this work. But the fact that Terra has earned FedRAMP and FISMA certifications, which enable federal agencies or federally funded projects to engage with the cloud computing system for handling sensitive data, generally satisfies these concerns and helps them to gain approval to use Terra from their state or locality.

Do you have examples of how Terra is being used for public health internationally?

BM: The Africa CDC, a policy and public health priority-setting body for the entire continent, was struggling to analyze Vibrio cholerae data from surveillance efforts in seven countries to find and track drug-resistant cholera. 

Last October, they came to us at a conference in Senegal and asked, “Any chance you can analyze cholera data in Terra?” We talked about their data and found publicly available tools that could be run in Terra to address their needs. And on the dodgy Wi-Fi in the conference hotel lobby, we got their project lead a Terra account, their cholera data securely into Terra, and the analysis running within the hour. Needless to say they were impressed, and we’re now actively working with them to bring Terra to other pathogens and use cases across the continent. 

How are you using Terra at the Broad for public health?

DP: The data we generate in the Sabeti lab through some of our projects are a very important part of some of the surveillance data that public health uses. For example, one of our current projects has to do with surveillance of viral respiratory diseases in hospitals. It's a multi-year project, and our goal is to have data flowing more seamlessly through the Massachusetts Department of Public Health, the CDC, and national databases such that it informs these groups what is causing disease in real time. 

BM: Terra is also our data processing workhorse for our research efforts to understand the relationship between parasite genetic diversity and malaria transmission intensity in Senegal. We're also building workflows and workspaces to support the global malaria genomic surveillance community with open, standardized, user friendly data analysis tools, enabling malaria scientists and national control programs to analyze their own data. That’s what’s so incredible about this platform — it’s helping labs all around the world build up important capacities that they didn’t have before. It’s really changing how pathogen surveillance gets done.

 

This conversation was edited for length and clarity.