CompMap: a reference-based compression program to speed up read mapping to related reference sequences.

Bioinformatics
Authors
Keywords
Abstract

SUMMARY: Exhaustive mapping of next-generation sequencing data to a set of relevant reference sequences becomes an important task in pathogen discovery and metagenomic classification. However, the runtime and memory usage increase as the number of reference sequences and the repeat content among these sequences increase. In many applications, read mapping time dominates the entire application. We developed CompMap, a reference-based compression program, to speed up this process. CompMap enables the generation of a non-redundant representative sequence for the input sequences. We have demonstrated that reads can be mapped to this representative sequence with a much reduced time and memory usage, and the mapping to the original reference sequences can be recovered with high accuracy.

AVAILABILITY AND IMPLEMENTATION: CompMap is implemented in C and freely available at http://csse.szu.edu.cn/staff/zhuzx/CompMap/.

CONTACT: xiaoyang@broadinstitute.org

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Year of Publication
2015
Journal
Bioinformatics
Volume
31
Issue
3
Pages
426-8
Date Published
2015 Feb 01
ISSN
1367-4811
URL
DOI
10.1093/bioinformatics/btu656
PubMed ID
25282641
Links