You are here

PLoS One DOI:10.1371/journal.pone.0090581

MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

Publication TypeJournal Article
Year of Publication2014
AuthorsLee, W-P, Stromberg, MP, Ward, A, Stewart, C, Garrison, EP, Marth, GT
JournalPLoS One
Date Published2014
KeywordsAlgorithms, Chromosome Mapping, Escherichia coli, High-Throughput Nucleotide Sequencing, Humans, INDEL Mutation, Interspersed Repetitive Sequences, Neural Networks (Computer), Polymorphism, Single Nucleotide, ROC Curve, Sequence Analysis, DNA, Software

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (


Alternate JournalPLoS ONE
PubMed ID24599324
PubMed Central IDPMC3944147
Grant ListU01 HG006513 / HG / NHGRI NIH HHS / United States
3U01HG006513-02S1 / HG / NHGRI NIH HHS / United States
5R01HG004719-04 / HG / NHGRI NIH HHS / United States