Tagged with #homopolymerrun
0 documentation articles | 0 announcements | 2 forum discussions


No posts found with the requested search criteria.
No posts found with the requested search criteria.

Created 2016-01-19 16:27:08 | Updated | Tags: gccontent haplotypecaller homopolymerrun tandemrepeatannotator hrun
Comments (9)

I am using Genotype Given Allele with Haplotype Caller I am trying to explicitely request all annotations that the documentation says are compatible with the Haplotype caller (and that make sense for a single sample .. e.g. no hardy weinberg ..)

the following annotations all have "NA" GCContent(GC) HomopolymerRun(Hrun) TandemRepeatAnnotator (STR RU RPA) .. but are valid requests because I get no errors from GATK.

This is the command I ran (all on one line)

java -Xmx40g -jar /data5/bsi/bictools/alignment/gatk/3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller --input_file /data2/external_data/Weinshilboum_Richard_weinsh/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/Kocher_Jean-Pierre_m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/Kocher_Jean-Pierre_m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff

Log file is below( Notice "weird" WARNings about) "StrandBiasBySample annotation exists in input VCF header".. which make no sense because the header is empty other than the barebone fields.

This is the barebone VCF head /data2/external_data/Kocher_Jean-Pierre_m026645/s109575.ez/Sequencing_2016/OMNI.vcf

fileformat=VCFv4.2

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 723918 rs144434834 G A 30 PASS . chr1 729632 rs116720794 C T 30 PASS . chr1 752566 rs3094315 G A 30 PASS . chr1 752721 rs3131972 A G 30 PASS . chr1 754063 rs12184312 G T 30 PASS . chr1 757691 rs74045212 T C 30 PASS . chr1 759036 rs114525117 G A 30 PASS . chr1 761764 rs144708130 G A 30 PASS .

This is the output

INFO 10:03:06,080 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:03:06,082 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 10:03:06,083 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 10:03:06,083 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 10:03:06,086 HelpFormatter - Program Args: -T HaplotypeCaller --input_file /data2/external_data/Weinshilboum_Richard_weinsh/s115343.beauty/Paired_analysis/secondary/Paired_10192014/IGV_BAM/pair_EX167687/s_EX167687_DNA_Blood.igv-sorted.bam --alleles:vcf /data2/external_data/Kocher_Jean-Pierre_m026645/s109575.ez/Sequencing_2016/OMNI.vcf --phone_home NO_ET --gatk_key /projects/bsi/bictools/apps/alignment/GenomeAnalysisTK/3.1-1/Hossain.Asif_mayo.edu.key --reference_sequence /data2/bsi/reference/sequence/human/ncbi/hg19/allchr.fa --minReadsPerAlignmentStart 1 --disableOptimizations --dontTrimActiveRegions --forceActive --out /data2/external_data/Kocher_Jean-Pierre_m026645/s109575.ez/Sequencing_2016/EX167687.0.0375.chr22.vcf --logging_level INFO -L chr22 --downsample_to_fraction 0.0375 --downsampling_type BY_SAMPLE --genotyping_mode GENOTYPE_GIVEN_ALLELES --standard_min_confidence_threshold_for_calling 20.0 --standard_min_confidence_threshold_for_emitting 0.0 --annotateNDA --annotation BaseQualityRankSumTest --annotation ClippingRankSumTest --annotation Coverage --annotation FisherStrand --annotation GCContent --annotation HomopolymerRun --annotation LikelihoodRankSumTest --annotation MappingQualityRankSumTest --annotation NBaseCount --annotation QualByDepth --annotation RMSMappingQuality --annotation ReadPosRankSumTest --annotation StrandOddsRatio --annotation TandemRepeatAnnotator --annotation DepthPerAlleleBySample --annotation DepthPerSampleHC --annotation StrandAlleleCountsBySample --annotation StrandBiasBySample --excludeAnnotation HaplotypeScore --excludeAnnotation InbreedingCoeff INFO 10:03:06,093 HelpFormatter - Executing as m037385@franklin04-213 on Linux 2.6.32-573.8.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26. INFO 10:03:06,094 HelpFormatter - Date/Time: 2016/01/19 10:03:06 INFO 10:03:06,094 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:03:06,094 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:03:06,545 GenomeAnalysisEngine - Strictness is SILENT INFO 10:03:06,657 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Fraction: 0.04 INFO 10:03:06,666 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 10:03:07,012 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.35 INFO 10:03:07,031 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 10:03:07,170 IntervalUtils - Processing 51304566 bp from intervals INFO 10:03:07,256 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 10:03:07,595 GenomeAnalysisEngine - Done preparing for traversal INFO 10:03:07,595 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 10:03:07,595 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 10:03:07,596 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 10:03:07,596 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. WARN 10:03:07,709 StrandBiasTest - StrandBiasBySample annotation exists in input VCF header. Attempting to use StrandBiasBySample values to calculate strand bias annotation values. If no sample has the SB genotype annotation, annotation may still fail. INFO 10:03:07,719 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units INFO 10:03:37,599 ProgressMeter - chr22:5344011 0.0 30.0 s 49.6 w 10.4% 4.8 m 4.3 m INFO 10:04:07,600 ProgressMeter - chr22:11875880 0.0 60.0 s 99.2 w 23.1% 4.3 m 3.3 m Using AVX accelerated implementation of PairHMM INFO 10:04:29,924 VectorLoglessPairHMM - libVectorLoglessPairHMM unpacked successfully from GATK jar file INFO 10:04:29,925 VectorLoglessPairHMM - Using vectorized implementation of PairHMM WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called WARN 10:04:29,938 AnnotationUtils - Annotation will not be calculated, genotype is not called WARN 10:04:29,939 AnnotationUtils - Annotation will not be calculated, genotype is not called INFO 10:04:37,601 ProgressMeter - chr22:17412465 0.0 90.0 s 148.8 w 33.9% 4.4 m 2.9 m INFO 10:05:07,602 ProgressMeter - chr22:18643131 0.0 120.0 s 198.4 w 36.3% 5.5 m 3.5 m INFO 10:05:37,603 ProgressMeter - chr22:20133744 0.0 2.5 m 248.0 w 39.2% 6.4 m 3.9 m INFO 10:06:07,604 ProgressMeter - chr22:22062452 0.0 3.0 m 297.6 w 43.0% 7.0 m 4.0 m INFO 10:06:37,605 ProgressMeter - chr22:23818297 0.0 3.5 m 347.2 w 46.4% 7.5 m 4.0 m INFO 10:07:07,606 ProgressMeter - chr22:25491290 0.0 4.0 m 396.8 w 49.7% 8.1 m 4.1 m INFO 10:07:37,607 ProgressMeter - chr22:27044271 0.0 4.5 m 446.4 w 52.7% 8.5 m 4.0 m INFO 10:08:07,608 ProgressMeter - chr22:28494980 0.0 5.0 m 496.1 w 55.5% 9.0 m 4.0 m INFO 10:08:47,609 ProgressMeter - chr22:30866786 0.0 5.7 m 562.2 w 60.2% 9.4 m 3.8 m INFO 10:09:27,610 ProgressMeter - chr22:32908950 0.0 6.3 m 628.3 w 64.1% 9.9 m 3.5 m INFO 10:09:57,610 ProgressMeter - chr22:34451306 0.0 6.8 m 677.9 w 67.2% 10.2 m 3.3 m INFO 10:10:27,611 ProgressMeter - chr22:36013343 0.0 7.3 m 727.5 w 70.2% 10.4 m 3.1 m INFO 10:10:57,613 ProgressMeter - chr22:37387478 0.0 7.8 m 777.1 w 72.9% 10.7 m 2.9 m INFO 10:11:27,614 ProgressMeter - chr22:38534891 0.0 8.3 m 826.8 w 75.1% 11.1 m 2.8 m INFO 10:11:57,615 ProgressMeter - chr22:39910054 0.0 8.8 m 876.4 w 77.8% 11.4 m 2.5 m INFO 10:12:27,616 ProgressMeter - chr22:41738463 0.0 9.3 m 926.0 w 81.4% 11.5 m 2.1 m INFO 10:12:57,617 ProgressMeter - chr22:43113306 0.0 9.8 m 975.6 w 84.0% 11.7 m 112.0 s INFO 10:13:27,618 ProgressMeter - chr22:44456937 0.0 10.3 m 1025.2 w 86.7% 11.9 m 95.0 s INFO 10:13:57,619 ProgressMeter - chr22:45448656 0.0 10.8 m 1074.8 w 88.6% 12.2 m 83.0 s INFO 10:14:27,620 ProgressMeter - chr22:46689073 0.0 11.3 m 1124.4 w 91.0% 12.5 m 67.0 s INFO 10:14:57,621 ProgressMeter - chr22:48062438 0.0 11.8 m 1174.0 w 93.7% 12.6 m 47.0 s INFO 10:15:27,622 ProgressMeter - chr22:49363910 0.0 12.3 m 1223.6 w 96.2% 12.8 m 29.0 s INFO 10:15:57,623 ProgressMeter - chr22:50688233 0.0 12.8 m 1273.2 w 98.8% 13.0 m 9.0 s INFO 10:16:12,379 VectorLoglessPairHMM - Time spent in setup for JNI call : 0.061128124000000006 INFO 10:16:12,379 PairHMM - Total compute time in PairHMM computeLikelihoods() : 22.846350295 INFO 10:16:12,380 HaplotypeCaller - Ran local assembly on 25679 active regions INFO 10:16:12,434 ProgressMeter - done 5.1304566E7 13.1 m 15.0 s 100.0% 13.1 m 0.0 s INFO 10:16:12,435 ProgressMeter - Total runtime 784.84 secs, 13.08 min, 0.22 hours INFO 10:16:12,435 MicroScheduler - 727347 reads were filtered out during the traversal out of approximately 4410423 total reads (16.49%) INFO 10:16:12,435 MicroScheduler - -> 2 reads (0.00% of total) failing BadCigarFilter INFO 10:16:12,436 MicroScheduler - -> 669763 reads (15.19% of total) failing DuplicateReadFilter INFO 10:16:12,436 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 10:16:12,436 MicroScheduler - -> 57582 reads (1.31% of total) failing HCMappingQualityFilter INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 10:16:12,437 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 10:16:12,438 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter


Created 2012-12-14 09:59:51 | Updated | Tags: unifiedgenotyper homopolymerrun
Comments (3)

How does GATK2 handle the variants called at homopolymeric regions in the genome? Is this feature enabled and used during the variant calling (with UG/HC) or should we do it separately with VariantFilteration. Is there any specific parameter to control this? Best Raj