Exon output Depth of Coverage
Posted in Ask the team | Last updated on


Comments (3)

Hello,

I'm runng DOC on my exome data and the output is hard to connect. This is the command i'm using: java -Xmx2g -jar ~/programs/GenomeAnalysisTK-2.2-2-gf44cc4e/GenomeAnalysisTK.jar -T DepthOfCoverage -R ~/dependencies/reference/hg19.fasta -o doc -I doctest.bam -geneList:REFSEQ ../refSeqGenes.txt -L ~/dependencies/bedfiles/coverage/500genes.exons.refGene.sort.bed -ct 1 -ct 5 -ct 10 -ct 20 -ct 50

The refseqfile looks like this: 75 NM_003036 chr1 + 2160133 2241652 2160205 2238204 7 2160133,2234416,2234723,2235278,2235731,2237458,2238015, 2161174,2234542,2234839,2235541,2236024,2237689,2241652, 0SKI cmpl cmpl 0,0,0,2,1,0,0, 637 NM_001195563 chr1 + 6845383 6932107 6845590 6931825 4 6845383,6880240,6885151,6931816, 6845635,6880310,6885270,6932107, 0 CAMTA1 cmpl cmpl 0,0,1,0, 676 NM_000302 chr1 + 11994723 12035599 11994836 12034865 19 11994723,12008032,12009829,12010413,12012679,12014886,12016973,12017898,12018572,12020702,12023588,12024231,12024700,12025536,12026307,12027043,12030726,12032928,12034709, 11994912,12008124,12009963,12010577,12012792,12014950,12017071,12018000,12018704,12020824,12023693,12024357,12024842,12025650,12026373,12027148,12030873,12033054,12035599, 0 PLOD1 cmpl cmpl 0,1,0,2,1,0,1,0,0,0,2,2,2,0,0,0,0,0,0,

The exon bed file looks like this: chr1 2160205 2161174 NM_003036_cds_0_0_chr1_2160206_f 0 + chr1 2234416 2234542 NM_003036_cds_1_0_chr1_2234417_f 0 + chr1 2234723 2234839 NM_003036_cds_2_0_chr1_2234724_f 0 + chr1 2235278 2235541 NM_003036_cds_3_0_chr1_2235279_f 0 + chr1 2235731 2236024 NM_003036_cds_4_0_chr1_2235732_f 0 + chr1 2237458 2237689 NM_003036_cds_5_0_chr1_2237459_f 0 + chr1 2238015 2238204 NM_003036_cds_6_0_chr1_2238016_f 0 + chr1 6845590 6845635 NM_001195563_cds_0_0_chr1_6845591_f 0 + chr1 6845590 6845635 NM_015215_cds_0_0_chr1_6845591_f 0 + chr1 6880240 6880310 NM_001195563_cds_1_0_chr1_6880241_f 0 + chr1 6880240 6880310 NM_015215_cds_1_0_chr1_6880241_f 0 + chr1 6885151 6885270 NM_001195563_cds_2_0_chr1_6885152_f 0 + chr1 6885151 6885270 NM_015215_cds_2_0_chr1_6885152_f 0 + chr1 6931816 6931825 NM_001195563_cds_3_0_chr1_6931817_f 0 + chr1 7151363 7151431 NM_015215_cds_3_0_chr1_7151364_f 0 +

The gene_summary file shows coverage on all genes: SKI 73147 33.45 73147 33.45 21 32 45 100.0 99.8 93.4 75.8 16.6 CAMTA1 8158 33.57 8158 33.57 25 33 47 100.0 100.0 100.0 81.5 17.3 UNKNOWN 5312955 54.97 5312955 54.97 32 49 74 99.7 98.2 96.2 87.9 48.3 PLOD1 85711 39.24 85711 39.24 26 34 48 100.0 99.5 97.3 84.8 22.8 HSPG2 407112 30.90 407112 30.90 19 29 41 99.5 98.0 91.8 72.3 13.6

and the interval_summery file shows all exons: chr1:2160206-2161174 31939 32.96 31939 32.96 18 32 44 100.0 100.0 93.3 70.4 16.4 chr1:2234417-2234542 7165 56.87 7165 56.87 41 60 71 100.0 100.0 100.0 100.0 62.7 chr1:2234724-2234839 4946 42.64 4946 42.64 42 43 45 100.0 100.0 100.0 100.0 0.0 chr1:2235279-2235541 9542 36.28 9542 36.28 30 38 46 100.0 100.0 100.0 100.0 10.3 chr1:2235732-2236024 7763 26.49 7763 26.49 24 28 32 100.0 100.0 100.0 90.8 0.0 chr1:2237459-2237689 9711 42.04 9711 42.04 34 49 53 100.0 100.0 100.0 88.3 42.9 chr1:2238016-2238204 2081 11.01 2081 11.01 9 13 15 100.0 97.9 57.7 0.0 0.0 chr1:6845591-6845635 655 14.56 655 14.56 14 16 17 100.0 100.0 100.0 0.0 0.0

This information is used to verify that a certain variant was covered good, not only on the position of the variant but also over the rest of the gene it is in. This is a bit hard to see in the interval_summery file because I can't see the exon or genenames in there and I can't see this in the gene_summery file because i mis individual exon information. This means crossrefferencing the bedfile and interval_summery with ugly scripting or creating a spoofed refseq file per exon. Is there an option to format the output where the coverage of each exon is shown with the exon and/or gene name (from the info in the bed or refseq file)?

e.g. something like this: chr1:2160206-2161174 NM_003036_cds_0_0_chr1_2160206_f SKI 31939 32.96 31939 32.96 18 32 44 100.0 100.0 93.3 70.4 16.4

Regards,

Martin


Return to top Comment on this article in the forum