Tagged with #randomlysplitvariants 1 documentation article | 1 announcement | 1 forum discussion

A new tool has been released!

Check out the documentation at RandomlySplitVariants.

GATK 3.1 was released on March 18, 2014. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history

Haplotype Caller

• Added new capabilities to the Haplotype Caller to use hardware-based optimizations. Can be enabled with --pair_hmm_implementation VECTOR_LOGLESS_CACHING. Please see the 3.1 Version Highlights for more details about expected speed ups and some background on the collaboration that made these possible.
• Fixed bugs in computing the weights of edges in the assembly graph. This was causing bad genotypes to be output when running the Haplotype Caller over multiple samples simultaneously (as opposed to creating gVCFs in the new recommended pipeline, which was working as expected).

Variant Recalibrator

• Fixed issue where output could be non-deterministic with very large data sets.

CalculateGenotypePosteriors

• Fixed several bugs where bad input were causing the tool to crash instead of gracefully exiting with an error message.

Miscellaneous

• RandomlySplitVariants can now output splits comprised of more than 2 output files.
• FastaAlternateReferenceMaker can now output heterozygous sites using IUPAC ambiguity encoding.
• Picard, Tribble, and Variant jars updated to version 1.109.1722.

When I use SelectVariants -fraction option more than once, I always get the same variant sites. The same happens for RandomlySplitVariants. Is there a way of randomly taking a fraction of the vcf that is not always the same, but truly random (which means it will pick different sites each time, of course with a certain probability of including common sites)?