What is D-ToxoG?
D-ToxoG is a tool for removing the OxoG artifact from a set of SNV calls.
How does D-ToxoG work?
The goal of D-ToxoG is to limit the output mutation calls to less than 1% artifact
Costello, M., Pugh, T. J., Fennell, T. J., Stewart, C., Lichtenstein, L., Meldrim, J. C., et al. (2013). Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Research, 41(6), e67. http://doi.org/10.1093/nar/gks1443
How do I get D-ToxoG?
The D-ToxoG Matlab scripts are available here (ZIP 50kb).
SNV MAF file with the below columns. Column names are case sensitive
There are many intermediate files and figures generated by the filter, see the directory specified when running the filter.
The main outputs are two maf files. One containing all input calls and another with all artifact calls removed (i.e. no lines where oxoGCut = 1). These maf files will also have the below columns added.
New columns are added to the maf files:
To execute D-ToxoG, run startFilterMAFFile
The latest usage instructions can be found by running help startFilterMAFFile.
% Run an input maf and output it to a pass.maf.annotated. Use a mat file if available to speed loading of the maf file. Generate plots and use the standard PoxoG of .96. Take the defaults for the rest of the parameters. Put all outputs into the results directory.
startFilterMAFFile('C:\Lee\work\oxoGv3Results\PR_TCGA_HNSC_Capture.maf.annotated', 'PR_TCGA_HNSC_Capture.pass.maf.annotated', 'results/', 1, 1, '0.96')
Experiments carried out by the sequencing platform have established that the OxoG artifact arises from an oxidation process in library construction. Prior to the PCR step, conversion of guanine to 8-oxoguanine (8-oxoG) tends to occur in the context of CGG (where the 8-oxoG is the middle G) sequences. Unlike regular guanine, 8-oxoG has a higher chemical affinity to bind with adenine rather than cytosine. The presence of 8-oxoG during subsequent PCR amplification cycles introduces adenine bases into DNA molecules at sites where cytosine should have been, systematically producing CAG sequences where the original sequence was CCG. Unlike natural mutations, non-reference artifact bases are locked specific strand orientations by the forked adapter ligation step before PCR, resulting in G>T artifacts on the F1R2 orientation and C>A artifacts on the F2R1 orientation. An IGV screenshot of an artifact mutation from sample BLCA_A2C5 is shown below.
Above is the distribution of orientation biases from a test set of 8 tumor types (40 samples with varying degrees of artifact). The red spike close to FoxoG=1 is the artifact component.