Tagged with #haploid
1 documentation article | 1 announcement | 6 forum discussions



Created 2012-07-26 14:50:55 | Updated 2014-10-24 00:55:34 | Tags: unifiedgenotyper official ploidy haploid analyst intermediate
Comments (14)

In general most GATK tools don't care about ploidy. The major exception is, of course, at the variant calling step: the variant callers need to know what ploidy is assumed for a given sample in order to perform the appropriate calculations.

Ploidy-related functionalities

As of version 3.3, the HaplotypeCaller and GenotypeGVCFs are able to deal with non-diploid organisms (whether haploid or exotically polyploid). In the case of HaplotypeCaller, you need to specify the ploidy of your non-diploid sample with the -ploidy argument. HC can only deal with one ploidy at a time, so if you want to process different chromosomes with different ploidies (e.g. to call X and Y in males) you need to run them separately. On the bright side, you can combine the resulting files afterward. In particular, if you’re running the -ERC GVCF workflow, you’ll find that both CombineGVCFs and GenotypeGVCFs are able to handle mixed ploidies (between locations and between samples). Both tools are able to correctly work out the ploidy of any given sample at a given site based on the composition of the GT field, so they don’t require you to specify the -ploidy argument.

For earlier versions (all the way to 2.0) the fallback option is UnifiedGenotyper, which also accepts the -ploidy argument.

Cases where ploidy needs to be specified

  1. Native variant calling in haploid or polyploid organisms.
  2. Pooled calling where many pooled organisms share a single barcode and hence are treated as a single "sample".
  3. Pooled validation/genotyping at known sites.

For normal organism ploidy, you just set the -ploidy argument to the desired number of chromosomes per organism. In the case of pooled sequencing experiments, this argument should be set to the number of chromosomes per barcoded sample, i.e. (Ploidy per individual) * (Individuals in pool).

Important limitations

Several variant annotations are not appropriate for use with non-diploid cases. In particular, InbreedingCoeff will not be annotated on non-diploid calls. Annotations that do work and are supported in non-diploid use cases are the following: QUAL, QD, SB, FS, AC, AF, and Genotype annotations such as PL, AD, GT, etc.

You should also be aware of the fundamental accuracy limitations of high ploidy calling. Calling low-frequency variants in a pool or in an organism with high ploidy is hard because these rare variants become almost indistinguishable from sequencing errors.


Created 2014-08-27 18:39:39 | Updated 2014-12-16 03:19:38 | Tags: haplotypecaller ploidy haploid polyploid beta
Comments (14)

Until now, HaplotypeCaller was only capable of calling variants in diploid organisms due to some assumptions made in the underlying algorithms. I'm happy to announce that we now have a generalized version that is capable of handling any ploidy you specify at the command line!

This new feature, which we're calling "omniploidy", is technically still under development, but we think it's mature enough for the more adventurous to try out as a beta test ahead of the next official release. We'd especially love to get some feedback from people who work with non-diploids on a regular basis, so we're hoping that some of you microbiologists and assorted plant scientists will take it out for a spin and let us know how it behaves in your hands.

It's available in the latest nightly builds; just use the -ploidy argument to give it a whirl. If you have any questions or feedback, please post a comment on this article in the forum.

Caveat: the downstream tools involved in the new GVCF-based workflow (GenotypeGVCFs and CombineGVCFs) are not yet capable of handling non-diploid calls correctly -- but we're working on it.

UPDATE:

We have added omniploidy support to the GVCF handling tools, with the following limitations:

  • When running, you need to indicate the sample ploidy that was used to generate the GVCFs with -ploidy. As usual 2 is the default ploidy.

  • The system does not support mixed ploidy across samples nor positions. An error message will be thrown if you attempt to genotype GVCFs that have a mixture, or that have some genotype whose ploidy does not match the -ploidy argument.

LATEST UPDATE:

As of GATK version 3.3-0, the GVCF tools are capable of ad-hoc ploidy detection, and can handle mixed ploidies. See the release highlights for details.


Created 2015-07-17 22:01:12 | Updated 2015-07-17 22:14:12 | Tags: haplotypecaller ploidy haploid genotypegvcfs combinegvcfs chromosome-x
Comments (8)

Hi,

I'm attempting to run GenotypeGVCFs on a cohort of ~4200 human male samples with targeted sequencing. I'm following the current DNA-Seq guidelines for cohort genotyping, with GATK v3.4-0. For each sample, I ran HaplotypeCaller separately for diploid and haploid (i.e., chrX non-PAR) regions, specifying --ploidy 1 for the haploid regions, then combined the resulting two GVCFs with CombineGVCFs. I then combined the per-sample GVCFs into groups of 64 samples using CombineGVCFs. Finally, I ran GenotypeGVCFs with all samples separately for groups of ~100 small target intervals (baits). Every group of target intervals ran fine without error in about 4 hours with ~5GB of RAM, except for the non-PAR chrX regions, which were haploid for all samples.

For the haploid regions, GATK hangs on the very first base, slowly increasing memory usage, then eventually runs out of memory and exits. The estimated runtime keeps increasing without making any progress. The last run exited after 12 hours without making any progress. This happens no matter how much memory I specify (up to 128 GB).

Interestingly, a PAR region of chromosome X run with --ploidy 2 in HaplotypeCaller worked with no problem.

The inputted GVCF files to GenotypeGVCFs are uncompressed and were indexed by CombineGVCFs.

I'm using default settings for GenotypeGVCFs, except for the following:

--standard_min_confidence_threshold_for_calling 20 --standard_min_confidence_threshold_for_emitting 10 --dbsnp dbSNP142.20150416.GRCh37.for-GATK.chr1-MT.vcf.gz

I tried running GenotypeGVCFs with the latest v3.4-46 release, but the same problem occurred.

Below is example output:

INFO 10:57:51,803 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:57:51,810 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12 INFO 10:57:51,811 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 10:57:51,811 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 10:57:51,818 HelpFormatter - Program Args: -T GenotypeGVCFs -R /tmp/12715944.hpc-pbs.hpcc.usc.edu/hs37m.fa --dbsnp dbSNP142.20150416.GRCh37.for-GATK.chr1-MT.vcf.gz --standard_min_confidence_threshold_for_calling 20 --standard_min_confidence_threshold_for_emitting 10 [LONG LIST OF VARIANT FILES OMITTED] --out gatk.hc.combined.genotyped.chunk117.vcf.gz -L split_117.intervals --log_to_file gatk.hc.combined.genotyped.chunk117.log INFO 10:57:51,824 HelpFormatter - Executing as cedlund@hpc1130.m10g.hpcc.usc.edu on Linux 2.6.32-504.12.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13. INFO 10:57:51,825 HelpFormatter - Date/Time: 2015/07/17 10:57:51 INFO 10:57:51,826 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:57:51,826 HelpFormatter - --------------------------------------------------------------------------------- INFO 10:57:56,331 GenomeAnalysisEngine - Strictness is SILENT INFO 10:57:56,671 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 10:59:03,550 IntervalUtils - Processing 154370 bp from intervals WARN 10:59:03,615 IndexDictionaryUtils - Track dbsnp doesn't have a sequence dictionary built in, skipping dictionary validation INFO 10:59:03,766 GenomeAnalysisEngine - Preparing for traversal INFO 10:59:03,768 GenomeAnalysisEngine - Done preparing for traversal INFO 10:59:03,768 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 10:59:03,769 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 10:59:03,770 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 10:59:04,283 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files INFO 10:59:57,328 ProgressMeter - X:51033766 216.0 53.0 s 68.9 h 0.2% 7.2 h 7.2 h INFO 11:00:27,330 ProgressMeter - X:51035366 216.0 83.0 s 4.5 d 1.2% 111.5 m 110.1 m INFO 11:00:57,353 ProgressMeter - X:51035366 216.0 113.0 s 6.1 d 1.2% 2.5 h 2.5 h INFO 11:01:27,354 ProgressMeter - X:51035366 216.0 2.4 m 7.7 d 1.2% 3.2 h 3.2 h INFO 11:01:57,356 ProgressMeter - X:51035366 216.0 2.9 m 9.3 d 1.2% 3.9 h 3.8 h INFO 11:02:27,358 ProgressMeter - X:51035366 216.0 3.4 m 10.9 d 1.2% 4.5 h 4.5 h INFO 11:02:57,882 ProgressMeter - X:51035366 216.0 3.9 m 12.5 d 1.2% 5.2 h 5.2 h INFO 11:03:27,884 ProgressMeter - X:51035366 216.0 4.4 m 14.2 d 1.2% 5.9 h 5.8 h INFO 11:03:57,885 ProgressMeter - X:51035366 216.0 4.9 m 15.8 d 1.2% 6.6 h 6.5 h INFO 11:04:27,887 ProgressMeter - X:51035366 216.0 5.4 m 17.4 d 1.2% 7.3 h 7.2 h INFO 11:04:58,976 ProgressMeter - X:51035366 216.0 5.9 m 19.0 d 1.2% 7.9 h 7.8 h INFO 11:05:28,977 ProgressMeter - X:51035366 216.0 6.4 m 2.9 w 1.2% 8.6 h 8.5 h INFO 11:05:58,979 ProgressMeter - X:51035366 216.0 6.9 m 3.2 w 1.2% 9.3 h 9.2 h INFO 11:06:28,981 ProgressMeter - X:51035366 216.0 7.4 m 3.4 w 1.2% 10.0 h 9.8 h INFO 11:06:58,982 ProgressMeter - X:51035366 216.0 7.9 m 3.6 w 1.2% 10.6 h 10.5 h INFO 11:07:28,984 ProgressMeter - X:51035366 216.0 8.4 m 3.9 w 1.2% 11.3 h 11.2 h INFO 11:07:58,986 ProgressMeter - X:51035366 216.0 8.9 m 4.1 w 1.2% 12.0 h 11.8 h INFO 11:08:30,497 ProgressMeter - X:51035366 216.0 9.4 m 4.3 w 1.2% 12.7 h 12.5 h INFO 11:09:30,568 ProgressMeter - X:51035366 216.0 10.4 m 4.8 w 1.2% 14.0 h 13.8 h INFO 11:10:32,779 ProgressMeter - X:51035366 216.0 11.5 m 5.3 w 1.2% 15.4 h 15.2 h INFO 11:11:33,479 ProgressMeter - X:51035366 216.0 12.5 m 5.7 w 1.2% 16.8 h 16.6 h INFO 11:12:35,360 ProgressMeter - X:51035366 216.0 13.5 m 6.2 w 1.2% 18.2 h 17.9 h INFO 11:13:35,445 ProgressMeter - X:51035366 216.0 14.5 m 6.7 w 1.2% 19.5 h 19.3 h INFO 11:14:39,689 ProgressMeter - X:51035366 216.0 15.6 m 7.2 w 1.2% 20.9 h 20.7 h INFO 11:15:40,505 ProgressMeter - X:51035366 216.0 16.6 m 7.6 w 1.2% 22.3 h 22.0 h INFO 11:16:41,140 ProgressMeter - X:51035366 216.0 17.6 m 8.1 w 1.2% 23.7 h 23.4 h INFO 11:17:41,956 ProgressMeter - X:51035366 216.0 18.6 m 8.6 w 1.2% 25.0 h 24.7 h INFO 11:18:41,958 ProgressMeter - X:51035366 216.0 19.6 m 9.0 w 1.2% 26.4 h 26.0 h INFO 11:19:44,493 ProgressMeter - X:51035366 216.0 20.7 m 9.5 w 1.2% 27.8 h 27.4 h INFO 11:20:49,749 ProgressMeter - X:51035366 216.0 21.8 m 10.0 w 1.2% 29.2 h 28.8 h INFO 11:21:53,414 ProgressMeter - X:51035366 216.0 22.8 m 10.5 w 1.2% 30.6 h 30.3 h INFO 11:22:58,174 ProgressMeter - X:51035366 216.0 23.9 m 11.0 w 1.2% 32.1 h 31.7 h INFO 11:24:01,211 ProgressMeter - X:51035366 216.0 25.0 m 11.5 w 1.2% 33.5 h 33.1 h INFO 11:25:05,051 ProgressMeter - X:51035366 216.0 26.0 m 12.0 w 1.2% 34.9 h 34.5 h INFO 11:26:07,782 ProgressMeter - X:51035366 216.0 27.1 m 12.4 w 1.2% 36.3 h 35.9 h INFO 11:27:10,933 ProgressMeter - X:51035366 216.0 28.1 m 12.9 w 1.2% 37.8 h 37.3 h INFO 11:28:20,854 ProgressMeter - X:51035366 216.0 29.3 m 13.5 w 1.2% 39.3 h 38.8 h INFO 11:29:28,165 ProgressMeter - X:51035366 216.0 30.4 m 14.0 w 1.2% 40.8 h 40.3 h INFO 11:30:28,575 ProgressMeter - X:51035366 216.0 31.4 m 14.4 w 1.2% 42.2 h 41.6 h INFO 11:31:36,673 ProgressMeter - X:51035366 216.0 32.5 m 14.9 w 1.2% 43.7 h 43.1 h INFO 11:32:45,497 ProgressMeter - X:51035366 216.0 33.7 m 15.5 w 1.2% 45.2 h 44.7 h INFO 11:33:49,205 ProgressMeter - X:51035366 216.0 34.8 m 16.0 w 1.2% 46.7 h 46.1 h INFO 11:34:49,226 ProgressMeter - X:51035366 216.0 35.8 m 16.4 w 1.2% 48.0 h 47.4 h INFO 11:35:54,571 ProgressMeter - X:51035366 216.0 36.8 m 16.9 w 1.2% 49.5 h 48.8 h INFO 11:36:59,402 ProgressMeter - X:51035366 216.0 37.9 m 17.4 w 1.2% 50.9 h 50.3 h INFO 11:38:03,427 ProgressMeter - X:51035366 216.0 39.0 m 17.9 w 1.2% 52.3 h 51.7 h INFO 11:39:12,036 ProgressMeter - X:51035366 216.0 40.1 m 18.4 w 1.2% 53.9 h 53.2 h INFO 11:40:15,472 ProgressMeter - X:51035366 216.0 41.2 m 18.9 w 1.2% 55.3 h 54.6 h INFO 11:41:22,184 ProgressMeter - X:51035366 216.0 42.3 m 19.4 w 1.2% 56.8 h 56.1 h INFO 11:42:24,992 ProgressMeter - X:51035366 216.0 43.4 m 19.9 w 1.2% 58.2 h 57.5 h INFO 11:43:30,745 ProgressMeter - X:51035366 216.0 44.4 m 20.4 w 1.2% 59.7 h 58.9 h INFO 11:44:41,392 ProgressMeter - X:51035366 216.0 45.6 m 21.0 w 1.2% 61.3 h 60.5 h INFO 11:45:51,136 ProgressMeter - X:51035366 216.0 46.8 m 21.5 w 1.2% 62.8 h 62.0 h INFO 11:46:59,056 ProgressMeter - X:51035366 216.0 47.9 m 22.0 w 1.2% 64.3 h 63.5 h INFO 11:48:09,266 ProgressMeter - X:51035366 216.0 49.1 m 22.5 w 1.2% 65.9 h 65.1 h INFO 11:49:16,701 ProgressMeter - X:51035366 216.0 50.2 m 23.1 w 1.2% 67.4 h 66.6 h INFO 11:50:24,150 ProgressMeter - X:51035366 216.0 51.3 m 23.6 w 1.2% 68.9 h 68.1 h INFO 11:51:31,883 ProgressMeter - X:51035366 216.0 52.5 m 24.1 w 1.2% 70.5 h 69.6 h INFO 11:52:40,234 ProgressMeter - X:51035366 216.0 53.6 m 24.6 w 1.2% 72.0 h 71.1 h INFO 11:53:46,785 ProgressMeter - X:51035366 216.0 54.7 m 25.1 w 1.2% 73.5 h 72.6 h INFO 11:54:53,194 ProgressMeter - X:51035366 216.0 55.8 m 25.6 w 1.2% 75.0 h 74.0 h

Here is an example of the GVCF for 3 samples in one of the problem haploid regions:

X 51035345 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:78:2:0,78 .:9:99:3:0,112 X 51035346 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035347 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035348 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035349 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035350 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035351 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035352 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035353 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035354 . T <NON_REF> . . END=51035355 GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:2:45:2:0,45 .:9:99:3:0,112 X 51035356 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:41:1:0,41 .:9:99:3:0,112 X 51035357 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:40:1:0,40 .:9:99:3:0,112 X 51035358 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:39:1:0,39 .:9:99:3:0,112 X 51035359 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:38:1:0,38 .:9:99:3:0,112 X 51035360 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:41:1:0,41 .:9:99:3:0,112 X 51035361 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:39:1:0,39 .:9:99:3:0,112 X 51035362 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:38:1:0,38 .:9:99:3:0,112 X 51035363 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:7:99:4:0,120 .:1:38:1:0,38 .:9:99:3:0,112 X 51035364 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:1:38:1:0,38 .:9:99:3:0,112 X 51035365 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035366 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035367 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035368 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035369 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035370 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035371 . A C,<NON_REF> . . DP=246;MQ=60.00 GT:AD:DP:GQ:MIN_DP:PL:SB .:0,4,0:4:99:.:126,0,126:0,0,1,3 .:.:2:72:2:0,72,72 .:.:9:99:3:0,112,112 X 51035372 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035373 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:4:90:4:0,90 .:2:72:2:0,72 .:9:99:3:0,112 X 51035374 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:72:2:0,72 .:9:99:3:0,112 X 51035375 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:72:2:0,72 .:9:99:3:0,112 X 51035376 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:54:2:0,54 .:9:99:3:0,112 X 51035377 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:2:45:2:0,45 .:2:71:2:0,71 .:9:99:3:0,112 X 51035378 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:71:2:0,71 .:9:99:3:0,112 X 51035379 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:81:2:0,81 .:9:99:3:0,112 X 51035380 . G <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:78:2:0,78 .:9:99:3:0,112 X 51035381 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:45:2:0,45 .:9:99:3:0,112 X 51035382 . T <NON_REF> . . . GT:DP:GQ:MIN_DP:PL .:18:99:3:0,105 .:2:45:2:0,45 .:9:99:3:0,112

Any help is greatly appreciated. Please let me know if you need any other information.

Kindest regards, Chris


Created 2015-03-27 12:16:02 | Updated | Tags: haploid combinegvcfs
Comments (5)

I have been trying to use CombinegVCFs on gVCF file produce by HaplotypeCaller in GVCF mode. The output VCF file doesn't seem to have any data in the genotype field: (just a dot)

chr1 95849 . T . . END=95850 GT:DP:GQ:MIN_DP:PL .:48:89:47:0,90 .:100:99:3:0,106 chr1 95851 . T . . END=95856 GT:DP:GQ:MIN_DP:PL .:48:44:47:0,45 .:100:99:3:0,106 chr1 95857 . GAA G,GA, . . DP=50;MQ=60.23;MQ0=0 GT:AD:DP:MIN_DP:PL:SB .:0,6,41,0:47:.:1150,935,0,1090:0,0,18,23 .:.:100:3:0,106,106,106 chr1 95858 . A . . END=95859 GT:DP:GQ:MIN_DP:PL . .:100:99:3:0,106 chr1 95860 . A . . END=96535 GT:DP:GQ:MIN_DP:PL .:44:99:27:0,1038 .:100:99:3:0,106 chr1 96536 . A G, . . DP=95;MQ=57.24;MQ0=0 GT:AD:DP:MIN_DP:PL:SB .:0,25,0:25:.:792,0,792:0,0,13,12 .:.:70:70:0,0,0 chr1 96537 . G . . END=96823 GT:DP:GQ:MIN_DP:PL .:37:99:23:0,380 .:68:99:51:0,374 chr1 96824 . C . . . GT:DP:GQ:MIN_DP:PL .:31:25:31:0,26 .:51:57:51:0,58

This is the command I used:

java -Xmx45g -Djava.io.tmpdir=/home/LANPARK/mboursnell/javatempdir -jar /opt/gatk/GenomeAnalysisTK.jar -R /home/genetics/strep_equi/strep_equi.fasta -T CombineGVCFs -V 17-1-2-5_NL_S12_L001_R1_001.gVCF -V 17-1-2-6_NL_S9_L001_R1_001.gVCF -o combined_1a.vcf -S STRICT


Created 2013-07-30 09:04:36 | Updated | Tags: unifiedgenotyper ploidy haploid
Comments (3)

Hi,

Just wanted to confirm.. I have a data from 4 spores of a yeast (haploid) tetrad.. If I want to call out variants using all 4 spores (4 bam files), do I need to set -ploidy as 1 or as 4 (Number of samples in each pool * Sample Ploidy) ??

This is the code I am using:

java -d64 -Xms1g -Xmx4g -jar GenomeAnalysisTK.jar -glm SNP -nt 52 -R genome.fasta -T UnifiedGenotyper -I $basename"_A.realigned.bam" -I $basename"_B.realigned.bam" -I $basename"_C.realigned.bam" -I $basename"_D.realigned.bam" -ploidy 4 -o $basename.snps.vcf -stand_call_conf 25.0 -stand_emit_conf 10.0

Thank you.


Created 2013-07-16 17:10:42 | Updated | Tags: haploid
Comments (2)

Hi,

I'm working with SNP calling in a bacterium - I don't have a set of known SNPs, so prior to recalibration, so generate a mask file from all data. My question is, what should I put for the following two options:

--snpHets --indelHets

because the bacterium is haploid (and I specify --ploidy 1) it seems like these options should be set to zero, as there are no heterozygous loci, but I worry that if I set them to zero, that it won't work as expected. Any advice? I was just setting them to 0.001

Thanks, Gavin


Created 2012-10-01 16:29:36 | Updated 2012-10-01 17:06:04 | Tags: ploidy haploid
Comments (1)

Hello,

Does the GATK team have any recommendations for filtering SNP data for haploid genomes? Our team works with microbial eukaryotes, both haploid and diploid and we have used the GATK v3 best practices for filtering for the latter. [VQSR was not possible, since we do not have access to a truth/high confidence SNP set.]

Thanks, Mika


Created 2012-07-24 07:53:52 | Updated 2012-07-24 13:03:08 | Tags: unifiedgenotyper haploid
Comments (16)

Dear GATK team,

I know that in the past GATK was not suitable for haploid genomes. I wanted to ask if this possibly changed since then - and whether it is possible to use GATK for haploid genomes.

Thanks a lot, Gilgi