Data Sciences Platform, Broad Institute

Convolutional Neural Networks (CNNs) process the reference genome and aligned reads covering sites of genetic variation encoded as numeric tensors. Convolutions over these tensors learn to detect motifs useful for variant filtering and calling. Variant filtering models learn to classify variants as artifact or real. Variant calling models learn to segment genomic positions into the diploid genotypes. We will demonstrate how these models can integrate summary statistic information for faster training and potential applications in unsupervised learning. We will also explore several hyper-parameter optimization strategies for architecture selection. Improvements in both sensitivity and precision with respect to current state-of-the-art filtration methods like gaussian mixture models, random forests, and deep variant will be presented.

MIA Talks Search