## Realigner Target Creator

For a complete, detailed argument reference, refer to the GATK document page here.

## Indel Realigner

For a complete, detailed argument reference, refer to the GATK document page here.

# Running the Indel Realigner only at known sites

While we advocate for using the Indel Realigner over an aggregated bam using the full Smith-Waterman alignment algorithm, it will work for just a single lane of sequencing data when run in -knownsOnly mode. Novel sites obviously won't be cleaned up, but the majority of a single individual's short indels will already have been seen in dbSNP and/or 1000 Genomes. One would employ the known-only/lane-level realignment strategy in a large-scale project (e.g. 1000 Genomes) where computation time is severely constrained and limited. We modify the example arguments from above to reflect the command-lines necessary for known-only/lane-level cleaning.

The RealignerTargetCreator step would need to be done just once for a single set of indels; so as long as the set of known indels doesn't change, the output.intervals file from below would never need to be recalculated.

 java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R /path/to/reference.fasta \
-o /path/to/output.intervals \
-known /path/to/indel_calls.vcf


The IndelRealigner step needs to be run on every bam file.

java -Xmx4g -Djava.io.tmpdir=/path/to/tmpdir \
-jar /path/to/GenomeAnalysisTK.jar \
-I <lane-level.bam> \
-R <ref.fasta> \
-T IndelRealigner \
-targetIntervals <intervalListFromStep1Above.intervals> \
-o <realignedBam.bam> \
-known /path/to/indel_calls.vcf
--consensusDeterminationModel KNOWNS_ONLY \
-LOD 0.4