VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules.

Bioinformatics
Authors
Abstract

UNLABELLED: We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis.

AVAILABILITY AND IMPLEMENTATION: VariantBam and full documentation are available at github.com/jwalabroad/VariantBam

CONTACT: rameen@broadinstitute.org

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Year of Publication
2016
Journal
Bioinformatics
Volume
32
Issue
13
Pages
2029-31
Date Published
2016 Jul 01
ISSN
1367-4811
URL
DOI
10.1093/bioinformatics/btw111
PubMed ID
27153727
PubMed Central ID
PMC4920121
Links
Grant list
T32 HG002295 / HG / NHGRI NIH HHS / United States