|Publication Type||Journal Article|
|Year of Publication||2020|
|Authors||Abel, HJ, Larson, DE, Regier, AA, Chiang, C, Das, I, Kanchi, KL, Layer, RM, Neale, BM, Salerno, WJ, Reeves, C, Buyske, S, Matise, TC, Muzny, DM, Zody, MC, Lander, ES, Dutcher, SK, Stitziel, NO, Hall, IM|
|Corporate Authors||NHGRI Centers for Common Disease Genomics|
|Date Published||2020 May 27|
A key goal of whole-genome sequencing (WGS) for human genetics studies is to interrogate all forms of variation, including single nucleotide variants (SNV), small insertion/deletion (indel) variants and structural variants (SV). However, tools and resources for the study of SV have lagged behind those for smaller variants. Here, we used a scalable pipeline to map and characterize SV in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest WGS-based SV resource to date. On average, individuals carry 2.9 rare SVs that alter coding regions, affecting the dosage or structure of 4.2 genes and accounting for 4.0-11.2% of rare high-impact coding alleles. Based on a computational model, we estimate that SVs account for 17.2% of rare alleles genome-wide with predicted deleterious effects equivalent to loss-of-function coding alleles; approximately 90% of such SVs are non-coding deletions (mean 19.1 per genome). We report 158,991 ultra-rare SVs and show that around 2% of individuals carry ultra-rare megabase-scale SVs, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and non-coding elements, revealing trends related to element class and conservation. This work will help guide SV analysis and interpretation in the era of WGS.