(howto) Prepare a reference for use with BWA and GATK
Posted in Tutorials on 2013-06-17 20:44:09 | Last updated on 2015-03-30 22:10:15


Comments (15)

Objective

Prepare a reference sequence so that it is suitable for use with BWA and GATK.

Prerequisites

  • Installed BWA
  • Installed SAMTools
  • Installed Picard

Steps

  1. Generate the BWA index
  2. Generate the Fasta file index
  3. Generate the sequence dictionary

1. Generate the BWA index

Action

Run the following BWA command:

bwa index -a bwtsw reference.fa 

where -a bwtsw specifies that we want to use the indexing algorithm that is capable of handling the whole human genome.

Expected Result

This creates a collection of files used by BWA to perform the alignment.


2. Generate the fasta file index

Action

Run the following SAMtools command:

samtools faidx reference.fa 

Expected Result

This creates a file called reference.fa.fai, with one record per line for each of the contigs in the FASTA reference file. Each record is composed of the contig name, size, location, basesPerLine and bytesPerLine.


3. Generate the sequence dictionary

Action

Run the following Picard command:

java -jar picard.jar CreateSequenceDictionary \
    REFERENCE=reference.fa \ 
    OUTPUT=reference.dict 

Note that this is the new syntax for use with the latest version of Picard. Older versions used a slightly different syntax because all the tools were in separate jars, so you'd call e.g. java -jar CreateSequenceDictionary.jar directly.

Expected Result

This creates a file called reference.dict formatted like a SAM header, describing the contents of your reference FASTA file.


Return to top Comment on this article in the forum