Local Realignment around Indels
Posted in Methods and Algorithms on 2012-07-23 16:48:55 | Last updated on 2012-09-30 23:35:55


Comments (115)

Realigner Target Creator

For a complete, detailed argument reference, refer to the GATK document page here.

Indel Realigner

For a complete, detailed argument reference, refer to the GATK document page here.

Running the Indel Realigner only at known sites

While we advocate for using the Indel Realigner over an aggregated bam using the full Smith-Waterman alignment algorithm, it will work for just a single lane of sequencing data when run in -knownsOnly mode. Novel sites obviously won't be cleaned up, but the majority of a single individual's short indels will already have been seen in dbSNP and/or 1000 Genomes. One would employ the known-only/lane-level realignment strategy in a large-scale project (e.g. 1000 Genomes) where computation time is severely constrained and limited. We modify the example arguments from above to reflect the command-lines necessary for known-only/lane-level cleaning.

 java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -R /path/to/reference.fasta \
  -o /path/to/output.intervals \
  -known /path/to/indel_calls.vcf

The IndelRealigner step needs to be run on every bam file.

java -Xmx4g -Djava.io.tmpdir=/path/to/tmpdir \
  -jar /path/to/GenomeAnalysisTK.jar \
  -I  \
  -R  \
  -T IndelRealigner \
  -targetIntervals  \
  -o  \
  -known /path/to/indel_calls.vcf
  --consensusDeterminationModel KNOWNS_ONLY \
  -LOD 0.4

Return to top Comment on this article in the forum