Created 2015-04-24
Hello , I ve been trying to write a script for calculating coverage per gene,unsuccessfully(!) ,and I found now that is nicely done by GATK ! I would very much need to use this calculation of depthOfCoverage for each gene but I cannot find the geneList needed in the format explained here. I have a RefSeq gene list downloaded from UCSC table which contains RefSeq name ,cds_start & end and "chr" information. Is this acceptable? I want to do it for exons falling inside the genes, which I have downloaded from : ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/exome_pull_down_targets/ (phase3), so this would be my Intervals List. It contains only chr info and start-end. I have also calculated for my bam files the bedtools-genomecov with option of "bedgraph" ,so I wrote a script to calculate mean coverage for each exon whose reads fall onto. Is the calculation of DepthOfCoverage done in the same principle ? Moreover I cannot find in UCSC a table which combines RefSeq name with used gene Name. Is it combined in this genesList you provide from GATK? Can you guide me where I could find the exact url for /humgen/.../geneList.txt or if mine could work ,and if exons table is ok with only these 3 columns ? I m a registered member, as for writing in the forum. Is there any extra procedure needed to access your database ? Thanks in advance !

I am using GATK through the Galaxy main server to analyze variations from whole-genome re-sequencing of various samples of non-model species (nematodes worms). I would like to know whether it is possible to have with Galaxy's GATK tools a kind of pileup (base per base or intervall, like .bed) of genome indicating specifically which base where callable or not by Unified Genotyper (UG), such as "CallableLoci". The log & metrics files generated by UG in Galaxy give the general statistics of callable loci, but there is no such a file giving a detailed information of the eligibility of each base.

In the same kind of idea, I would like to get a per-locus-depth of coverage (which can partially help answering my previous question, although it does not take into account all the filters used by UG such as base quality, mapping quality, etc.). This tool is available on Galaxy. However, I am performing 3 rounds of BQSR to get my final vcf file. Shall I calculate the depth of coverage using the first BAM file before BQSR or the last recalibrated BAM file obtained in the 3rd round of BQSR? I don't think BQSR alter the coverage score, so I would say this shouldn't matter. Am I right?

Thanks in advance for help and advices, Fabrice

I have a set of CNV regions, and I would like to see how much my samples are overlapped with those regions. So I used DepthCoverage from GATK and fillted the interval parameters with my CNVs.

ERROR MESSAGE: Badly formed genome loc: Contig 3 given as location, but this contig isn't present in the Fasta sequence dictionary

I am sure the reference file is correct; I think, it could be because of that, these CNVs are large, therefore some part of them could be outside of the reference contig. Is there any way that either I can make my CNVs as proper and hg19 compatible bed file, or any other tools that can help me with that.