You are here

Proc Natl Acad Sci U S A DOI:10.1073/pnas.1322563111

Searching for missing heritability: designing rare variant association studies.

Publication TypeJournal Article
Year of Publication2014
AuthorsZuk, O, Schaffner, SF, Samocha, K, Do, R, Hechter, E, Kathiresan, S, Daly, MJ, Neale, BM, Sunyaev, SR, Lander, ES
JournalProc Natl Acad Sci U S A
Date Published2014 Jan 28
KeywordsGene Frequency, Genetic Predisposition to Disease, Genetic Variation, Genome-Wide Association Study, Humans, Mutation

Genetic studies have revealed thousands of loci predisposing to hundreds of human diseases and traits, revealing important biological pathways and defining novel therapeutic hypotheses. However, the genes discovered to date typically explain less than half of the apparent heritability. Because efforts have largely focused on common genetic variants, one hypothesis is that much of the missing heritability is due to rare genetic variants. Studies of common variants are typically referred to as genomewide association studies, whereas studies of rare variants are often simply called sequencing studies. Because they are actually closely related, we use the terms common variant association study (CVAS) and rare variant association study (RVAS). In this paper, we outline the similarities and differences between RVAS and CVAS and describe a conceptual framework for the design of RVAS. We apply the framework to address key questions about the sample sizes needed to detect association, the relative merits of testing disruptive alleles vs. missense alleles, frequency thresholds for filtering alleles, the value of predictors of the functional impact of missense alleles, the potential utility of isolated populations, the value of gene-set analysis, and the utility of de novo mutations. The optimal design depends critically on the selection coefficient against deleterious alleles and thus varies across genes. The analysis shows that common variant and rare variant studies require similarly large sample collections. In particular, a well-powered RVAS should involve discovery sets with at least 25,000 cases, together with a substantial replication set.


Alternate JournalProc. Natl. Acad. Sci. U.S.A.
PubMed ID24443550
PubMed Central IDPMC3910587