Tagged with #info
0 documentation articles | 0 announcements | 3 forum discussions

No articles to display.

No articles to display.

Created 2014-09-25 09:47:26 | Updated | Tags: info r leftalignandtrimvariants vcf4-2 number

Comments (11)

I have attached two VCF files generated with samtools (pass.vcf and fail.vcf). One of them (fail.vcf) contains this line:

##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">

When I run LeftAlignAndTrimVariants3.2 on the v4.2 VCF file containing the INFO line above, then I get this error:

##### ERROR MESSAGE: For input string: "R"

The line is perfectly valid according to the VCF4.2 (and 4.3) specifications:

"The Number entry is an Integer that describes the number of values that can be included with the INFO field." "If the field has one value for each possible allele (including the reference), then this value should be ‘R’."

It's an easy issue to handle, but it would be great, if you could eventually fix this low priority bug. Thanks!

I haven't attached the two small vcf files. "Uploaded file type is not allowed." But zip files are. Files attached.

Created 2013-07-03 19:08:39 | Updated 2013-07-03 19:13:43 | Tags: duplicatereadfilter flagstat info

Comments (8)

I just noticed something odd about GATK read counts. Using a tiny test data set, I generated a BAM file with marked duplicates.

This is the output for samtools flagstat:

40000 + 0 in total (QC-passed reads + QC-failed reads)
63 + 0 duplicates
38615 + 0 mapped (96.54%:-nan%)
40000 + 0 paired in sequencing
20000 + 0 read1
20000 + 0 read2
37764 + 0 properly paired (94.41%:-nan%)
38284 + 0 with itself and mate mapped
331 + 0 singletons (0.83%:-nan%)
76 + 0 with mate mapped to a different chr
54 + 0 with mate mapped to a different chr (mapQ>=5)

This is what I get as part of GATK info stats when running RealignerTargetCreator:

INFO  14:42:05,815 MicroScheduler - 5175 reads were filtered out during traversal out of 276045 total (1.87%) 
INFO  14:42:05,816 MicroScheduler -   -> 84 reads (0.03% of total) failing BadMateFilter 
INFO  14:42:05,816 MicroScheduler -   -> 1014 reads (0.37% of total) failing DuplicateReadFilter 
INFO  14:42:05,816 MicroScheduler -   -> 4077 reads (1.48% of total) failing MappingQualityZeroFilter 

This is what I get as part of GATK info stats when running DepthOfCoverage (on the orignal BAM, not after realignment):

INFO  15:03:17,818 MicroScheduler - 2820 reads were filtered out during traversal out of 309863 total (0.91%) 
INFO  15:03:17,818 MicroScheduler -   -> 1205 reads (0.39% of total) failing DuplicateReadFilter 
INFO  15:03:17,818 MicroScheduler -   -> 1615 reads (0.52% of total) failing UnmappedReadFilter 

Why are all of these so different? Why are there much more total reads and duplicate reads for GATK stats?

Created 2012-12-15 15:03:06 | Updated | Tags: gatk info

Comments (1)


I have used GATK for multi sample SNP calling, the total depth DP at the variant site does not seem to be the same when the individual depths are summed up for few of the variants. For example, for the below shown variant site, DP=24 in the INFO column and from the FORMAT column depth= 3+3+8+2=16. chr1 3015369 . C A 39.22 AC=4;AF=1.00;AN=4;DP=24;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=4;MLEAF=1.00;MQ=6.96;MQ0=21;QD=2.45;SB=-2.321e+01 GT:AD:DP:GQ:PL ./. 1/1:3,3:6:6:48,6,0 1/1:8,2:10:3:25,3,0

Could someone give an explanation for differences in depths?