
Sequencing & Assembly
Sequence Data Generation Overview
Viral RNA is isolated from host serum, plasma, or cell culture supernatant using standard techniques. The RNA genome is reverse transcribed to produce a cDNA copy. This cDNA material is amplified with a pool of specific primers to produce 12 overlapping amplicons of 1.5 to 2kb in size. These PCR amplicons target 2X physical coverage of the viral genome. The amplicons are pooled and undergo quality control testing.
Normalized pooled amplicons are then prepared for sequencing using primer panels consisting of 96 specific primer pairs that produce 500-700bp amplicons from the target viral genome. Each primer is tailed with a binding sequence for the M13 sequencing primer. Post-PCR the 96 amplicons from each genome are sequenced using the Broad Institute's standard Sanger sequencing production process. Each of the 96 amplicons are sequenced using M13 primers and the sequence is read out by ABI 3730 sequencing instruments. The 96 overlapping amplicons give an average sequence coverage of 8X along the coding region of the target viral genome.
Assembly/Annotation Overview
- Before assembly sequence reads are trimmed to remove primer sequence from the directed amplification and sequencing reaction processes.
- The assembly process uses paired reads to identify contiguous stretches of sequence (contigs).
- Contigs are ordered and linked together into larger supercontigs by using paired reads lying in different contigs.
- Final viral assembly relies on assisting the assembly with a finished reference assembly for the target viral genome. Initial contigs are ordered by alignment to the reference and originally unused reads are added to the assembly.
- Final consensus assembly is automatically checked for completeness of the viral coding region or coding plus UTR regions depending on the specifications for the project.
- Assemblies with frameshift InDels are automatically detected and validated manually to ensure that InDels are supported by underlying read data.
- Genomes are annotated by an automated transfer of the NCBI RefSeq annotation to the Broad viral assembly. Annotations are automatically checked for accuracy and annotations failing the quality control checks undergo manual annotation.
