Effect of cohort in HaplotypeCaller
Posted in Ask the GATK team | Last updated on 2013-10-28 20:34:07


Comments (15)

I'm trying to measure the effect of the composition of a cohort on the calling of individual sample. So my cohort includes 46 Caucasian and 1 Japanese (J1) exome seq samples from 1KG. I want to check if the Japanese specific variants will be buried in such a cohort.

I'm only using chr22 and the average depth of coverage on all samples are around 10X. The input files to HC are the reduced bam files. Here are the results and my questions:

1). the command:

java -Xmx4g -jar $gatkDir/GenomeAnalysisTK.jar -T HaplotypeCaller \
 -R $refGenome \
 --dbsnp $dbSNP \
 -stand_call_conf 50.0 \
 -stand_emit_conf 10.0 \
 -o $cohort.raw.var.vcf \
-I $cohort.list

2). While the calls to the Caucasians seem normal, all calls to J1 are "./."

3). Then I run HC with J1 alone, the resulting vcf file only contains headers, no content; i.e., the last line in that file is the following:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT jpt.ERR034603

The HC finished w/o reporting any error. Here is a portion from the output of HC:

INFO  15:13:25,888 MicroScheduler - 136025 reads were filtered out during the traversal out of approximately 2686707 total reads (5.06%)
INFO  15:13:25,888 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO  15:13:25,889 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO  15:13:25,889 MicroScheduler -   -> 136025 reads (5.06% of total) failing HCMappingQualityFilter
INFO  15:13:25,889 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO  15:13:25,890 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO  15:13:25,890 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO  15:13:25,890 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter

4). I run HC with one Caucasian alone, the vcf file looks normal.

5). Then I run HC using a cohort including J1 and four other Japanese samples, the calling to J1 seems normal.

Could anyone explain 2. 3, and 4? Should I increase the percentage of Japanese in the Caucasian cohort in order to get calling on J1? Thanks a lot!


Return to top Comment on this article in the forum