Hi, I'm calling Variants with HaplotypeCaller in a population of 2 Parents and 7 F1-individuals. After read backed phasing I'm combining the vcf files of my genotypes with CombineVariants. In the outfile I very often find "./.". I thought this means there is no coverage at a certain position. But at many positions I do have good coverage. Why do I then get ./.? Moreover I used FastaAlternateReferenceMaker and created a new reference sequence including the variants from the parents. In that case, after I run HC and do the phasing and combine variants steps, I only get "./." at positions where there is really no coverage (as I can see in my mappings). Nadia
I have several samples, some with a coverage of around 14, some with a coverage around 6. I want to use UnifiedGenotyper for SNP calling but I have no clue how to set stand_call_conf (and stand_emit_conf) as it is suggested to set stand_call_conf for samples with coverage >10 to 30 and for samples with coverage <10 to Q4. So how should I procede?
Calling SNPs using a single bam file with the command:
java -Xmx30g -jar GenomeAnalysisTK.jar \ -T UnifiedGenotyper \ -R ref.fasta \ -I input.bam \ -o output.vcf \
and when looking at the output file, most DP values were equal to the AD values and in few cases the AD value was higher. Thought that AD values are the unfiltered counts of all reads and DP fields describes the total depth of reads that passed the Unified genotyper’s internal quality control. Is it normal for the AD values to be higher than the DP value?
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT eo227 gi|218430358|emb|CU928163.2| 2317180 . T G 76.55 . AC=1;AF=0.50;AN=2;BaseQRankSum=1.568;DP=10;Dels=0.00;FS=11.181;HRun=0;HaplotypeScore=15.8585;MQ=28.63;MQ0=0;MQRankSum=1.036;QD=7.66;ReadPosRankSum=-0.633;SB=-0.01 GT:AD:DP:GQ:PL 0/1:6,4:10:99:107,0,154 gi|218430358|emb|CU928163.2| 2317181 . T G 71.96 . AC=1;AF=0.50;AN=2;BaseQRankSum=0.550;DP=10;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=19.8574;MQ=28.63;MQ0=0;MQRankSum=-1.754;QD=7.20;ReadPosRankSum=-1.754;SB=-0.01 GT:AD:DP:GQ:PL 0/1:3,4:10:87.90:102,0,88
I am also trying to check the coverage at each position of my reference using the CoverageBySample tool (with and without the –L argument):
java -Xmx30g -jar GenomeAnalysisTK.jar \ -T UnifiedGenotyper \ –T CoverageBySample \ –R ref.fasta \ -I input.bam \ -o output.cov\
The output (below) is giving the right coverage but without the positions on the reference and also skipping all positions with no coverage. Is there any way to get these positions in the output file?
eo78 10 eo78 10 eo78 10 eo78 10 eo78 10 eo78 11 eo78 12 eo78 12 eo78 12
Dear GATK team,
I have a question regarding adding the functionality to CallableLoci to allow multiple coverage cutoffs (similar to the -ct option in DepthOfCoverage) for LOW_COVERAGE. Basically for example COVERAGE_BELOW_10X, COVERAGE_BELOW_20X etc.
These multiple statistics are important for WGS interpretation, not just a single LOW_COVERAGE value. At this point a separate DepthofCoverage instance has to be run (to do the same job twice) and takes much additional time. Instead of a single pass CallableLoci.
A simple patch to the CallableLoci code does the job, but it would be great if this can be implemented in the build as a simple command line option.
Hi GATK Team
You are doing an amazing job, keep it up!
I apologise in advance if this question has come up and I've not found it within the forum, but I am quite new to all of this and would like to ask you a few questions regarding identifying structural variation from exome resequencing data:
I am trying to assess the best method to identify potential structural variants from a single bam file: One way of doing this proposed to me was to look at DP values (using UnifiedGenotyper) that are less than 5 and understandably there are inherent confounders in doing so. So I ran the same bam file through the DepthOfCoverage tool to focus on regions of interest which have zero coverage. However, when I overlaid the data from both and mapped their co-ordinates to the human genome, I have found that the overlap between the DP values and DoC regions was extremely small (<5%) - why could this be? Surely there should be more overlap? Are they therefore measuring different things? Have I done something wrong somewhere and I don't know it? I have tried to access the documentation for DepthOfCoverage to try and make sense of it but it seems unavailable on the website (http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html). Please could you advise?
Below are the command lines I've been using:
java -jar GenomeAnalysisTK.jar -T DepthOfCoverage -omitBaseOutput -omitLocusTable -R referencefilename.fa -I samplefilename.bam -L regionsofinterest.txt -o outputfile.coverage java -jar GenomeAnalysisTK.jar -R referencefilename.fa -T UnifiedGenotyper -I samplefilename.bam --dbsnp dbsnpreferencefile.vcf --genotype_likelihoods_model SNP -o outputfilename.vcf --output_mode EMIT_ALL_SITES -stand_call_conf 50.0 -stand_emit_conf 0.0 -dcov 200 -L regionsofinterest.bed
Thank you in advance for your help, it is much appreciated