## SVDiscovery walkerPosted in GenomeSTRiP Documentation on 2012-09-12 18:47:08 | Last updated on 2012-09-21 23:08:38

### 1. Introduction

The SVDiscovery walker traverses a set of BAM files to perform structural variation discovery. This walker is the main component of the SVDiscovery pipeline.

Currently, only discovery of deletions relative to the reference is implemented.

### 2. Inputs / Arguments

• -I <bam-file> : The set of input BAM files.

• -runDirectory <directory> : The directory where auxilliary output files will be written (default is the current directory).

• -md <directory> : The metadata directory containing metadata about the input data set. See SVPreprocess.

• -R <fasta-file> : Reference sequence. : An indexed fasta file containing the reference sequence that the input BAM files were aligned against. The fasta file must be indexed with 'samtools faidx' or the equivalent.

• -genomeMaskFile <mask-file> : Mask file that describes the alignability of the reference sequence. : See Genome Mask Files.

• -configFile <configuration-file> : This file contains settings for specialized settings that do not normally need to be changed. : A default configuration file is provided in conf/genstrip_parameters.txt.

• -partitionName <string> : This specifies the name of the partition being computed during parallel runs. : The output files will be prefixed with the name of the partition.

• -searchLocus <interval> : The genomic locus being searched. : Only structural variations that fit within the specified locus will be output. If non-overlapping search loci are used, then the union of the discovered variants should be non-redundant.

• -searchWindow <interval> : The interval to be used for searching the input BAM files. : This is typically larger than the search locus to avoid missing events due to boundary effects. : This argument should typically be set to the same value as the GATK -L argument.

• -searchMinimumSize <size> : The minimum length of a deletion event for it to be included in the output.

• -searchMaximumSize <size> : The maximum length of a deletion event for it to be included in the output.

### 3. Outputs

• -O <vcf-file> : The main output is a VCF file containing descriptions of the variant sites along with annotations about the evidence for the variability of the site. : The output VCF file will need to be filtered, based on the annotations, to select a final set of high specificity variants.

Depending on settings in the configuration file, this walker will also produce a number of auxilliary output files. These files are mostly useful for debugging. The content and format of these files is subject to change.

### 4. Running

Currently, this walker needs to be invoked through a special wrapper around the GATK command line interface. This wrapper accepts all of the standard GATK command line options. An example is shown below.

java -Xmx4g -cp SVToolkit.jar:GenomeAnalysisTK.jar \
-T SVDiscovery \
-configFile conf/genstrip_parameters.txt \
-R Homo_sapiens_assembly18.fasta \
-I input1.bam -I input2.bam \
-O output.sites.vcf \
-runDirectory run1 \
-minimumSize 100 \
-maximumSize 1000000 \
-searchLocus chr20::1-1000000 \
-L chr20:1-1000000 \
-searchWindow chr20:1-1000000


### 5. Dependencies

The SV Discovery code uses some R scripts. R needs to be installed and the Rscript executable needs to be on your path to run this walker.