SVAltAlign.q is a sample Queue script that is part of Genome STRiP.
This script realigned previously unmapped reads against putative alternate alleles generated from a VCF file describing a set of variants to be genotypes. The output is a merged bam file that contains these alignements to the alternate alleles. These alterante allele alignments are then used as input to genotyping.
-vcf <input-vcf-file> : A VCF file containing descriptions of the
structural variations. : Only records for structural variations with precise
breakpoints will be processed.
-I <bam-file> : The set of input BAM files containing records to realign.
-md <directory> : The metadata directory containing metadata about the
input data set.
-R <fasta-file> : Reference sequence. : An indexed fasta file containing
the reference sequence. The fasta file must be indexed with samtools faidx
or the equivalent.
-altAlleleFlankLength <n> : The length of flanking sequence from the
reference genome used during realignment (default 200).
-alignUnmappedMates <boolean> : Whether to align unmapped mates of mapped
reads to the alternate alleles (default true). : If false, then unmapped reads
with a POS field will not be ignored.
-configFile <configuration-file> : This file contains values for
specialized settings that do not normally need to be changed. : A default
configuration file is provided in conf/genstrip_parameters.txt.
-O <bam-file> : The default output for this pipeline is a single merged
bam file for all input bam files and all alternate alleles. : The sequence
identifier for an alternate allele is VariantID_N where N is the index of the
alternate allele in the VCF file (i.e. the first alternate allele is allele
1).The SVAltAlign.q script is run through Queue.
Because Genome STRiP is a third-party GATK library, the Queue command line must be invoked explicitly, as shown in the example below.
java -Xmx2g -cp Queue.jar:SVToolkit.jar:GenomeAnalysisTK.jar \
org.broadinstitute.sting.queue.QCommandLine \
-S SVAltAlign.q \
-S SVQScript.q \
-gatk GenomeAnalysisTK.jar \
-cp SVToolkit.jar:GenomeAnalysisTK.jar \
-configFile /path/to/svtoolkit/conf/genstrip_parameters.txt \
-tempDir /path/to/tmp/dir \
-md metadata \
-R Homo_sapiens_assembly18.fasta \
-vcf input.vcf \
-I input1.bam -I input2.bam \
-O output.bam \
-run \
-bsub \
-jobQueue gsa \
-jobProject 1KG \
-jobLogDir logs
Queue typically requires the following arguments to run Genome STRiP pipelines.
-run : Actually run the pipeline (default is to do a dry run).
-S <queue-script> : Script to run. : The base script SVQScript.q from the
SVToolkit should also be specified with a separate -S argument.
-gatk <jar-file> : The path to the GATK jar file.
-cp <classpath> : The java classpath to use for pipeline commands. This
must include SVToolkit.jar and GenomeAnalysisTK.jar. : Note: Both -cp
arguments are required in the example command. The first -cp argument is for
the invocation of Queue itself, the second -cp argument is for the invocation
of pipeline processes that will be run by Queue.
-tempDir <directory> : Path to a directory to use for temporary files.
-bsub : Use LSF to submit jobs.
-jobQueue <queue-name> : LSF queue to use.
-jobProject <project-name> : LSF project to use for accounting.
-jobLogDir <directory> : Directory for LSF log files.