## (howto) Prepare a reference for use with BWA and GATKPosted in Tutorials on 2013-06-17 20:44:09 | Last updated on 2015-09-24 13:10:42

Comments (16)

#### Objective

Prepare a reference sequence so that it is suitable for use with BWA and GATK.

#### Prerequisites

• Installed BWA
• Installed SAMTools
• Installed Picard

#### Steps

1. Generate the BWA index
2. Generate the Fasta file index
3. Generate the sequence dictionary

### 1. Generate the BWA index

#### Action

Run the following BWA command:

bwa index -a bwtsw reference.fa


where -a bwtsw specifies that we want to use the indexing algorithm that is capable of handling the whole human genome.

#### Expected Result

This creates a collection of files used by BWA to perform the alignment.

### 2. Generate the fasta file index

#### Action

Run the following SAMtools command:

samtools faidx reference.fa


#### Expected Result

This creates a file called reference.fa.fai, with one record per line for each of the contigs in the FASTA reference file. Each record is composed of the contig name, size, location, basesPerLine and bytesPerLine.

### 3. Generate the sequence dictionary

#### Action

Run the following Picard command:

java -jar picard.jar CreateSequenceDictionary \
REFERENCE=reference.fa \
OUTPUT=reference.dict


Note that this is the new syntax for use with the latest version of Picard. Older versions used a slightly different syntax because all the tools were in separate jars, so you'd call e.g. java -jar CreateSequenceDictionary.jar directly.

#### Expected Result

This creates a file called reference.dict formatted like a SAM header, describing the contents of your reference FASTA file.

Return to top Comment on this article in the forum