Reference-based phasing using the Haplotype Reference Consortium panel.
Authors | |
Abstract | Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server. |
Year of Publication | 2016
|
Journal | Nat Genet
|
Volume | 48
|
Issue | 11
|
Pages | 1443-1448
|
Date Published | 2016 Nov
|
ISSN | 1546-1718
|
DOI | 10.1038/ng.3679
|
PubMed ID | 27694958
|
PubMed Central ID | PMC5096458
|
Links | |
Grant list | S10 RR028832 / RR / NCRR NIH HHS / United States
R01 HG006399 / HG / NHGRI NIH HHS / United States
F32 HG007805 / HG / NHGRI NIH HHS / United States
R01 HG007022 / HG / NHGRI NIH HHS / United States
R01 MH101244 / MH / NIMH NIH HHS / United States
R01 HL117626 / HL / NHLBI NIH HHS / United States
R01 EY022005 / EY / NEI NIH HHS / United States
|