Mapping and characterization of structural variation in 17,795 human genomes.

Nature
Authors
Abstract

A key goal of whole-genome sequencing (WGS) for human genetics studies is to interrogate all forms of variation, including single nucleotide variants (SNV), small insertion/deletion (indel) variants and structural variants (SV). However, tools and resources for the study of SV have lagged behind those for smaller variants. Here, we used a scalable pipeline to map and characterize SV in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest WGS-based SV resource to date. On average, individuals carry 2.9 rare SVs that alter coding regions, affecting the dosage or structure of 4.2 genes and accounting for 4.0-11.2% of rare high-impact coding alleles. Based on a computational model, we estimate that SVs account for 17.2% of rare alleles genome-wide with predicted deleterious effects equivalent to loss-of-function coding alleles; approximately 90% of such SVs are non-coding deletions (mean 19.1 per genome). We report 158,991 ultra-rare SVs and show that around 2% of individuals carry ultra-rare megabase-scale SVs, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and non-coding elements, revealing trends related to element class and conservation. This work will help guide SV analysis and interpretation in the era of WGS.

Year of Publication
2020
Journal
Nature
Date Published
2020 May 27
ISSN
1476-4687
DOI
10.1038/s41586-020-2371-0
PubMed ID
32460305
Links
Additional Materials