Erick Salvador Rocha
Erick Salvador Rocha, a senior at Cornell University, developed a pipeline to design primer pairs for an Amplicon Sequencing protocol that could be used for whole genome sequencing of Lassa virus.
Lassa virus (LASV) is a single-stranded RNA virus that causes Lassa fever in Western Africa. The genome of LASV is highly diverse and is composed of two linear segments (L and S). The Broad offers unparalleled opportunities to grow as a scientist and as a person. This experience has been intellectually rewarding, and it has allowed me to meet amazing scientists, colleagues and friends.Currently, unbiased deep sequencing is the main technique used to sequence entire LASV genomes from patient samples. One major drawback of this approach is that it generates mostly non-viral sequence data, making viral sequencing inefficient. Therefore, we aim to develop an amplicon sequencing technique (AmpSeq), in which a mix of primers is designed to deep-sequence the LASV genome. The main challenge of developing an AmpSeq protocol for LASV is the difficulty of creating primers that can capture the diversity of the LASV genome. Here we report our findings in developing primers to implement an AmpSeq protocol for lineage II of the LASV genome (found in Nigeria). First, we used the software PrimalScheme, which was originally designed to generate primers for AmpSeq of the Zika virus (genetically more conserved than LASV), to create a primer set for AmpSeq. Then, we developed a new pipeline that adapted CATCH (a software for probe design) in order to create 20 nt oligos. These oligos are then filtered using Primer3 to keep only potential primers, then our pipeline finds the smallest set of oligos that can capture most of the genomic diversity, and a final set of probes is designed for amplicons of length 400 with ~50 nt overlap. The set of primers obtained from our pipeline was computationally compared against those made by PrimalScheme for overall genome amplification of all available genome sequences of LASV, lineage II. The primers designed by PrimalScheme amplify more than 60% of the genome for conserved genomic sequences in LASV lineage II (genomes with >90% identity), but they amplify less than 40% of the genomes for diverse genomic sequences of lineage II (genomes with <90% identity). In comparison, the primers produced by our pipeline amplify more than 80% of the genome for conserved genomic sequences, and they also amplify more than 40% of the genome of diverse genomic sequences. Our findings indicate that we have developed a pipeline that can design primers for AmpSeq, and that the primers designed with our pipeline can cope better with the high genomic diversity of LASV. Our results move forward our goal to develop an AmpSeq protocol for LASV, such protocol would increase the sequencing capabilities of countries vulnerable to LASV outbreaks like Nigeria.
Project: Developing an amplicon sequencing protocol for Lassa virus
Mentor: Katherine Siddle, Infectious Disease and Microbiome Program