SVGenotyper walker
Posted in GenomeSTRiP Documentation | Last updated on 2012-09-21 23:07:40


Comments (14)

1. Introduction

The SVGenotyper walker traverses a VCF file to compute genotypes for structural variations. This walker is the main component of the SVGenotyper pipeline.

Currently, only genotyping of deletions relative to the reference is implemented.

2. Inputs / Arguments

  • -I <bam-file> : The set of input BAM files.

  • -runDirectory <directory> : The directory where auxilliary output files will be written (default is the current directory).

  • -md <directory> : The metadata directory containing metadata about the input data set. See SVPreprocess.

  • -R <fasta-file> : Reference sequence. An indexed fasta file containing the reference sequence that the input BAM files were aligned against. The fasta file must be indexed with 'samtools faidx' or the equivalent.

  • -genomeMaskFile <mask-file> : Mask file that describes the alignability of the reference sequence. See Genome Mask Files.

  • -configFile <configuration-file> : This file contains settings for specialized settings that do not normally need to be changed. A default configuration file is provided in conf/genstrip_parameters.txt.

  • -sample <sample-ID> : The sample to gentoype (or list of samples if multiple arguments are supplied). By default, genotypes are computed for all samples present in the input BAM files.

  • -sampleList <file> : A file containing the list of samples to genotype (one sample ID per line).

  • -altAlleleAlignments <bam-file> : A BAM file containing alignments to the alternate alleles of events present in the input VCF file. These alternate alignments should be computed by the SVAltAlign pipeline.

  • -partitionName <string> : This specifies the name of the partition being computed during parallel runs. The output files will be prefixed with the name of the partition.

  • -partition <partition-spec> : Describes the subset of the VCF file to process. : The format is "records:N-M" where ''N'' and ''M'' are the 1-based indexes of a range of records from the input VCF file that will be processed.

3. Outputs

  • -O <vcf-file> : The main output is a VCF file containing genotypes for structural variation sites from the input VCF file.

Depending on settings in the configuration file, this walker will also produce a number of auxilliary output files. These files are mostly useful for debugging. The content and format of these files is subject to change.

4. Running

Currently, this walker needs to be invoked through a special wrapper around the GATK command line interface. This wrapper accepts all of the standard GATK command line options. An example is shown below.

The input VCF file should be passed as a GATK ROD (reference ordered datum) file. This walker also requires the -BTI argument to be passed to the GATK engine.

java -Xmx4g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
    org.broadinstitute.sv.main.SVGenotyper \ 
    -T SVGenotyper \ 
    -configFile conf/genstrip_parameters.txt \ 
    -md metadata \ 
    -R Homo_sapiens_assembly18.fasta \ 
    -genomeMaskFile Homo_sapiens_assembly18.mask.36.fasta \ 
    -altAlignments alt_allele_alignments.bam \ 
    -B:input,VCF input.sites.vcf \ 
    -BTI \ 
    -I input1.bam -I input2.bam \ 
    -O output.genotypes.vcf \ 
    -runDirectory run1

5. Dependencies

The SV Genotyping code uses some R scripts. R needs to be installed and the Rscript executable needs to be on your path to run this walker.


Return to top Comment on this article in the forum