Batch Merging QScript

From GSA
Jump to: navigation, search

Warning: the material on this page is considered out of date by the GSA team.


as of June, 7, 2011. Please see Merging batched call sets for the most recent version.

Batch Merging Pipeline Explained

The batch merging pipeline takes a list of VCF files (those containing SNPs called in each batch), and a list of bam files (all bam files from every batch), along with a reference, path to your GATK Sting repository, and the desired output file. The output is a fully merged VCF. The pipeline works by taking all pass-filter bi-allelic sites from the batch VCFs, and testing for the alternate allele within each bam file, producing a set of genotype likelihoods at every variant site, for every individual. These genotype likelihood VCFs are then merged together, and the allele frequency is estimated (and genotypes calculated) by the unified genotyper engine, as though all samples had been called together. Because this step does not use the .bam files, read-based annotations (QD/SB/HaplotypeScore) are not assessed.

This is not a discovery tool, it only fills in genotypes at bi-allelic sites already discovered (and passed-filter).

The BMQS is still under development to try and reduce total runtime and compute burden. At the moment, its use is supported only in conjunction with LSF (and not OGE/SGE).

Running the Batch Merging QScript

Running the Batch Merging QScript is incumbent on the user's knowledge of Queue. The BMQS can be found at

/path/to/Sting/scala/qscript/playground/BatchMerge.q

Running with -h will list all of the arguments.

Example Command

java -Djava.io.tmpdir=/path/to/tmp/dir -jar /path/to/Sting/dist/Queue.jar \
-S /path/to/Sting/scala/qscript/playground/BatchMerge.q \
-vcfs /path/to/vcf.list \
-bams /path/to/bams.list \
-ref /path/to/reference.fasta \
-batch /path/to/batched_output.vcf \
-sting /path/to/GATK/Sting/directory \
[other Queue-specific arguments, see Queue documentation]

where the vcf.list and bams.list are newline-delimited lists of filepaths to batch vcfs, and sample bams, respectively.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox