SVAltAlign.q is a sample Queue script that is part of Genome STRiP.
This script realigned previously unmapped reads against putative alternate alleles generated from a VCF file describing a set of variants to be genotypes. The output is a merged bam file that contains these alignements to the alternate alleles. These alterante allele alignments are then used as input to genotyping.
-vcf <input-vcf-file> : A VCF file containing descriptions of the
structural variations. : Only records for structural variations with precise
breakpoints will be processed.
-I <bam-file> : The set of input BAM files containing records to realign.
-md <directory> : The metadata directory containing metadata about the
input data set.
-R <fasta-file> : Reference sequence. : An indexed fasta file containing
the reference sequence. The fasta file must be indexed with
or the equivalent.
-altAlleleFlankLength <n> : The length of flanking sequence from the
reference genome used during realignment (default 200).
-alignUnmappedMates <boolean> : Whether to align unmapped mates of mapped
reads to the alternate alleles (default true). : If false, then unmapped reads
with a POS field will not be ignored.
-configFile <configuration-file> : This file contains values for
specialized settings that do not normally need to be changed. : A default
configuration file is provided in conf/genstrip_parameters.txt.
-O <bam-file>: The default output for this pipeline is a single merged bam file for all input bam files and all alternate alleles. : The sequence identifier for an alternate allele is
Nis the index of the alternate allele in the VCF file (i.e. the first alternate allele is allele 1).
SVAltAlign.q script is run through Queue.
Because Genome STRiP is a third-party GATK library, the Queue command line must be invoked explicitly, as shown in the example below.
java -Xmx2g -cp Queue.jar:SVToolkit.jar:GenomeAnalysisTK.jar \ org.broadinstitute.sting.queue.QCommandLine \ -S SVAltAlign.q \ -S SVQScript.q \ -gatk GenomeAnalysisTK.jar \ -cp SVToolkit.jar:GenomeAnalysisTK.jar \ -configFile /path/to/svtoolkit/conf/genstrip_parameters.txt \ -tempDir /path/to/tmp/dir \ -md metadata \ -R Homo_sapiens_assembly18.fasta \ -vcf input.vcf \ -I input1.bam -I input2.bam \ -O output.bam \ -run \ -bsub \ -jobQueue gsa \ -jobProject 1KG \ -jobLogDir logs
Queue typically requires the following arguments to run Genome STRiP pipelines.
-run : Actually run the pipeline (default is to do a dry run).
-S <queue-script> : Script to run. : The base script SVQScript.q from the
SVToolkit should also be specified with a separate -S argument.
-gatk <jar-file> : The path to the GATK jar file.
-cp <classpath> : The java classpath to use for pipeline commands. This
must include SVToolkit.jar and GenomeAnalysisTK.jar. : Note: Both -cp
arguments are required in the example command. The first -cp argument is for
the invocation of Queue itself, the second -cp argument is for the invocation
of pipeline processes that will be run by Queue.
-tempDir <directory> : Path to a directory to use for temporary files.
-bsub : Use LSF to submit jobs.
-jobQueue <queue-name> : LSF queue to use.
-jobProject <project-name> : LSF project to use for accounting.
-jobLogDir <directory> : Directory for LSF log files.