SMaSH: a benchmarking toolkit for human genome variant calling.

Bioinformatics

Authors	Ameet Talwalkar Jesse Liptrap Julie Newcomb Christopher Hartl Jonathan Terhorst Kristal Curtis Ma'ayan Bresler Yun Song Michael Jordan David Patterson
Keywords	Humans Computational Biology Genomics Algorithms Software Polymorphism, Single Nucleotide High-Throughput Nucleotide Sequencing Genome, Human Databases, Genetic Data Interpretation, Statistical INDEL Mutation
Abstract	MOTIVATION: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers. RESULTS: We propose SMaSH, a benchmarking methodology for evaluating germline variant calling algorithms. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on these benchmarking data. Moreover, we illustrate the utility of SMaSH to evaluate the performance of some leading single-nucleotide polymorphism, indel and structural variant calling algorithms. AVAILABILITY AND IMPLEMENTATION: We provide free and open access online to the SMaSH tool kit, along with detailed documentation, at smash.cs.berkeley.edu
Year of Publication	2014
Journal	Bioinformatics
Volume	30
Issue	19
Pages	2787-95
Date Published	2014 Oct
ISSN	1367-4811
URL	http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=24894505
DOI	10.1093/bioinformatics/btu345
PubMed ID	24894505
PubMed Central ID	PMC4173010
Links	DOI Google Scholar PubMed
Grant list	T32 HG000047 / HG / NHGRI NIH HHS / United States T32-HG00047 / HG / NHGRI NIH HHS / United States

Recent Broad Publications

MIR137 polygenic risk for schizophrenia and ephrin-regulated pathway: Role in lateral ventricles and corpus callosum volume.

Investigating the role of common cis-regulatory variants in modifying penetrance of putatively damaging, inherited variants in severe neurodevelopmental disorders.

Optimisation and evaluation of viral genomic sequencing of SARS-CoV-2 rapid diagnostic tests: a laboratory and cohort-based study.

Substrate geometry affects population dynamics in a bacterial biofilm.

Transient Acute Diplopia as a Rare Side Effect of Hydromorphone: A Case Report.