Population sequencing data reveal a compendium of mutational processes in human germline

September 18, 2019
Kharchenko Lab, Dept. of Biomedical Informatics, Harvard Medical School

Mechanistic processes underlying human germline mutations remain largely unknown. Variation in mutation rate and spectra along the genome is informative about the biological mechanisms. We statistically decompose this variation into separate processes using independent component analysis of mutational spectra. The analysis of large-scale whole genome sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. Seven of these processes lend themselves to a biological interpretation. One process is driven by bulky DNA lesions that resolve asymmetrically with respect to transcription and replication. Two processes independently track direction of replication fork and replication timing. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions. We also demonstrate that a recently discovered mutagenic process specific to oocytes can be localized solely from population sequencing data. This process is spread across all chromosomes and is highly asymmetric with respect to direction of transcription suggesting a major role of DNA damage.