Phasing and imputing repeat variants across the genome/Hidden Markov models in phasing and imputation

Ronen Mukamel
Rice Math, HMS, BWH, Broad
Phasing and imputing repeat variants across the genome

Abstract: A fundamental mystery of the genome-wide association study (GWAS) era is the gap between the heritability of phenotypes observed in family studies and the heritability successfully explained by association studies.  One often cited source of this "missing heritability" is structural variants, which account for a majority of the base pairs varying among genomes but are usually omitted in GWAS due to the difficulty of genotyping them.  In this talk, I will present new methods that enable the extension of GWAS analyses to a certain class of structural variants, variable number tandem repeats (VNTRs).  The methods allow phasing of diploid repeat length estimates in whole-genome sequence data and imputation of repeat variants into much larger genotyped cohorts.  I will discuss ongoing efforts to apply this approach genome-wide in UK Biobank (N~500K) and incorporate these variants in association studies.

 

Po-Ru Loh
HMS, BWH, Broad
Primer: Hidden Markov models in phasing and imputation

Abstract: Haplotype phasing and imputation have become essential components of genome-wide association analysis pipelines, as these methods allow imputation of genetic variation from smaller whole-genome sequenced reference panels into larger cohorts (genotyped much more sparsely at low cost). Over the past two decades, phasing and imputation methods have undergone several generations of development as sample sizes and variant counts in typical analyses have each increased by five orders of magnitude. I will overview the algorithmic themes that have emerged from these approaches -- many based on the Li-Stephens hidden Markov model -- and discuss the computational considerations that are now informing future directions in this field.