(howto) Prepare a reference for use with BWA and GATK
Posted in Tutorials | Last updated on 2013-07-03 00:56:06


Comments (5)

Objective

Prepare a reference sequence so that it is suitable for use with BWA and GATK.

Prerequisites

  • Installed BWA
  • Installed SAMTools
  • Installed Picard

Steps

  1. Generate the BWA index
  2. Generate the Fasta file index
  3. Generate the sequence dictionary

1. Generate the BWA index

Action

Run the following BWA command:

bwa index -a bwtsw reference.fa 

where -a bwtsw specifies that we want to use the indexing algorithm that is capable of handling the whole human genome.

Expected Result

This creates a collection of files used by BWA to perform the alignment.


2. Generate the fasta file index

Action

Run the following SAMtools command:

samtools faidx reference.fa 

Expected Result

This creates a file called reference.fa.fai, with one record per line for each of the contigs in the FASTA reference file. Each record is composed of the contig name, size, location, basesPerLine and bytesPerLine.


3. Generate the sequence dictionary

Action

Run the following Picard command:

java -jar CreateSequenceDictionary.jar \
    REFERENCE=reference.fa \ 
    OUTPUT=reference.dict 

Expected Result

This creates a file called reference.dict formatted like a SAM header, describing the contents of your reference FASTA file.


Return to top Comment on this article in the forum