Broad Institute sequences its 100,000th whole human genome on National DNA Day

Milestone crossed on the 15th anniversary of the completion of the Human Genome Project, as the worldwide estimate for whole human genomes sequenced approaches one million

Lauren Solomon, Broad Communications
Credit: Lauren Solomon, Broad Communications

In a dramatic sign of the surge of genomic information available for research around the world, on National DNA Day the Broad Institute of MIT and Harvard sequenced its 100,000th whole human genome, adding to a global total that is approaching one million.

National DNA Day, Wednesday, April 25, commemorates the discovery of DNA's double helix in 1953 and marks the 15th anniversary of the successful completion of the Human Genome Project in 2003.

The 100,000th whole human genome sequenced at the Broad Institute is from a program focused on Asian and African patients affected with birth defects, as part of the Gabriella Miller Kids First Pediatric Research Program (Kids First). Kids First is a federally-funded initiative supported by the NIH Common Fund focused on discovering genetic causes for childhood cancer and structural birth defects. The program will create the Kids First Data Resource, including a rich database for clinical and genetic sequence data from thousands of patients (along with their parents) affected with these conditions from around the world.

Scientists use genome sequencing data — which is drawn from samples donated by patients with their consent and stripped of identifying information — to research the underlying causes of devastating diseases, as well to identify pathways for potential treatments.

“The Human Genome Project kicked off an extraordinary transformation in biology and in our understanding of disease,” said Eric S. Lander, president and founding director of the Broad Institute and a principal leader of the Human Genome Project. “Today, information is driving medicine, and we are beginning to understand human medicine as an information science. It is bringing together extraordinary experimental and computational scientists to answer questions that we could not conceive only a few years ago. This revolution will lead to many more new treatments and cures to help patients.”

“What amazing timing to have these two milestones converge on such an important day for genomic science,” said NIH Director Francis S. Collins. “When we announced completion of that first human genome sequence in April 2003, it would have been almost impossible to imagine that just 15 years later, a single genome center could produce 100,000 whole genomes. It is truly gratifying to see how genome sequencing is being used in a wide variety of applications — to understand the pathway of disease, and inform treatment strategies that enhance effectiveness and minimize risk.”

The amount of genomic data available for research doubles about every eight months. Since 2009, when the Broad had sequenced just 12 whole human genomes, the Broad Institute has generated more than 70 petabytes of genetic sequence and analysis data — the equivalent of more than 1.2 billion hours of streaming music files. This includes data from whole genomes (which represent the complete DNA sequence) as well as data from approximately 360,000 exomes (covering only the protein-coding genes), and beyond.

Fifteen years after the completion of the first high-quality sequence of the human genome, genomics has already transformed medicine and health research — from predictive genetic testing for people with strong family histories of certain types of cancer, to large-scale studies designed to understand how genetics, environmental factors, and lifestyle affect the development of disease.

A night and day effort to improve human health

The Broad Genomics group runs a fleet of 60 sequencing instruments 24 hours a day, generating data at a rate equivalent to a 30x whole human genome every 12 minutes. Sequencing the first human genome took thirteen years and cost nearly three billion dollars. Today each individual genome is completed in a matter of days at a cost of around $1,000.

“Our genomics organization has always been designed to operate at scale. Along the way we have pioneered methods for sample preparation, sequencing, and analysis,” said Stacey Gabriel, director of the Broad Genomics Platform. “The impressive performance of the latest sequencing instruments, combined with our fantastic staff, our process engineering, and our commitment to continuous improvement, allow us to produce the highest-quality data for the global community.”

Researchers at the Broad as well as in other academic, biotechnology, and pharmaceutical organizations use the information to fuel new biological and therapeutic insights. Now, with this utility becoming ever more clear and applicable, the Broad Genomics group has started producing whole genome data through their clinical laboratory (CAP accredited and CLIA licensed) allowing patients and physicians to access their data, in addition to researchers.

Engineers and computational biologists in the Broad’s Data Sciences Platform continue to design and share new tools and methods for data management and analysis — such as the Genome Analysis Toolkit (GATK), an open-source software package for processing and analysis of sequencing data now used by more than 50,000 researchers worldwide.

“The huge datasets generated from the genomics revolution have created a need for a companion revolution in data processing and analysis,” said Eric Banks, senior director of the Broad Institute Data Sciences Platform. “Our mission is to maximize the impact of data sciences in this area, through new computational tools to process and analyze the information. I’m hugely proud of our team for supporting this effort all the way to a hundred thousand whole genomes, with hundreds of thousands more to come.”