Popic Lab, Broad Institute
Structural variants (SV) are the greatest source of genetic diversity in the human genome and play a pivotal role in diseases such as Alzheimer’s, autism, autoimmune and cardiovascular disorders, and cancer. Breakthroughs in whole-genome sequencing, especially the advent of long-read technologies, have enabled significant progress in method development geared toward SV detection. Current state-of-the-art approaches extract hand-crafted features from the data and employ expert-driven statistical modeling or heuristics to predict different SV classes. However, manual engineering of SV-informative features and models is challenging given the multi-dimensionality of the sequencing data and the diversity of SV types, sizes, and sequencing platforms. As a result, general SV discovery still remains an open problem. In this primer talk, we will describe the problem of SV detection and its current challenges, motivate the need to develop extensible and generalizable methods to improve SV calling and genotyping, and introduce our formulation for SV detection as a task that can be effectively solved with deep learning. In particular, we show how SV detection can be reduced to a keypoint localization task in images constructed from sequence alignments and review model architectures suited for this task.