This software package is no longer supported and information on this page is provided for archival purposes only.

SWAP454 Help

Prerequisites for running the SWAP454 SNP Caller:

  • POSIX compatible operating system, on a supported chipset (x86_64, ia64, i686).
  • GCC and Make installed.

 

Compiling SWAP454 SNP


Download the SWAP454 SNP calling package, using the main CRD download page.

Here are the steps to take to build the program:

  • tar -xzf Swap454_SNPCallingPipeline_v1.2.tgz
  • cd Swap454
  • ./configure
  • make

Modules and scripts refered to in the following document can be found in this directory, or in the 454 directory within.

Data


  • 454 data should be in SffFormat prior to SNP calling.

  • If your sample is barcoded you will want to make sure the barcode adapters are properly trimmed before running this analysis. The best way to do this is using the sfffile program included in the 454 software package.

 

Modify Quality Scores


  • These steps can be completely skipped if you trust the quality scores given in the SFF file. A general rule of thumb is that if the sequence was generated using 454 software v1.1.* or later, the 454 quality scores are accurate enough to use as is. However if the sequence was generated with an earlier version of the software, these steps should be taken to reanalyze the quality.

  • If you choose to skip these steps, you can use the sffinfo program in the 454 software package to create a file of sequences and a file of quality scores that can be used for the alignment steps below.

Instructions:

MakeSffInfo

MakeSffInfo SFF=your_reads_file.sff FASTA=your_reads_file.fasta

Output:
your_reads_file.String.name= Compressed file containing all read names in sff.
your_reads_file.Short.clip_*= Binary files containg trimming information for vector and adapter.
your_reads_file.Basevec.bases= Base calls for reads in vector format.
your_reads_file.Qualvec.quals= Base qualities for reads in vector format.
your_reads_file.Shortvec.correspondence=
your_reads_file.Shortvec.flows= Flow information in vector format.
your_reads_file.Info= Info about the sff file required by the pipeline.
your_reads_file.fasta= 454 read sequences in fasta format

WARNING: your_reads_file.sff must end with the .sff (lowercase required) extension!

QualsFromSff

QualsFromSff SFFINFO=your_reads_file TABLE=200base.phredtable SAVE=True QUAL=your_reads_file.qual

For short reads (GS20 model sequencer), the 100base.phredtable should be used. For longer reads (FLX model sequencer) 200base.phredtable should be used. There is not yet a phredtable for the XLR model sequencer.

In addition to producing recalculated quality scores, all info files are reproduced to represent these new scores.

Output:
your_reads_file.String.name= Compressed file containing all read names in sff.
your_reads_file.Short.clip_*= Binary files containg trimming information for vector and adapter.
your_reads_file.Basevec.bases= Base calls for reads in vector format.
your_reads_file.Qualvec.quals= Base qualities for reads in vector format.
your_reads_file.Shortvec.correspondence=
your_reads_file.Shortvec.flows= Flow information in vector format.
your_reads_file.Info= Info about the sff file required by the pipeline.
your_reads_file.fasta= 454 read qualities in fasta format

Align Reads

evalfastanum.pl

evalfastaNum.pl your_reads_file.fasta your_reference_file.fasta=

Usage: evalfasta.pl mydata.fasta reference.fasta

Evaluates mydata.fasta by aligning it to reference.fasta.
Will produce the following file with the prefix mydata:
mydata.qltout (long text output with all the alignments).
Note that the "mydata" prefix includes any directories.</PRE>

This program first creates a lookup table of your reference file, and then produces alignments of the reads to the reference.

Output:
your_reads_file.orig.qltout= Original QueryLookupTable alignment files.
your_reads_file.qltout= Realigned QueryLookupTable alignment files.
your_reads_file.unplaced.fasta= Fasta file containing reads that did not align to the reference.
your_reads_file.bestaligns= Binary format files containg best alignments.
your_reads_file.otheraligns= Binary format file containg other alignments.
your_reference_file.fasta.lookuptable.fastb= Reference fasta represented in binary format.
your_reference_file.fasta.lookuptable.fastamb=
your_reference_file.fasta.lookuptable.lookup= Lookup table for your reference fasta.

Create Coverage Map

*MapNQSCoverage*

=MapNQSCoverage FASTA=your_reads_file.fasta QUAL=your_reads_file.qual QLTOUT=your_reads_file.qltout REF=your_reference_file.fasta O=your_reads_file MIN_QUAL=20 NQ=15

The Neighborhood Quality Score parameters (MIN_QUAL, NQ) can be modified. For example if you would like to bypass quality filtering you could set these paramters both to 0. For very high stringency quality filtering (to minimize false positives at the cost of sensitivity), setting these values both to 30 would be appropriate.

Multiple read sets can be given to this program at once. For each read set you must have a fasta, qual, and qltout file. For example:
=MapNQSCoverage FASTA="{your_reads_file_1.fasta,your_reads_file_2.fasta,your_reads_file_3.fasta}" QUAL="{your_reads_file_1.qual,your_reads_file_2.qual,your_reads_file_3.qual}" QLTOUT="{your_reads_file_1.qltout,your_reads_file_2.qltout,your_reads_file_3.qltout}" REF=your_reference_file.fasta O=your_reads_file MIN_QUAL=20 NQ=15


Output:
your_reference_file.coverag_map= CoverageMap file that contains information on read coverage for every base in the reference sequence.

This table describes each column in the coverage_map output file:

pos Position in the reference
ref Consensus base of the reference
A Number of reads with the A nucleotide at this position aligning in the forward direction
C Number of reads with the C nucleotide at this position aligning in the forward direction
G Number of reads with the G nucleotide at this position aligning in the forward direction
T Number of reads with the T nucleotide at this position aligning in the forward direction
rA Number of reads with the A nucleotide at this position aligning in the reverse direction
rC Number of reads with the C nucleotide at this position aligning in the reverse direction
rG Number of reads with the G nucleotide at this position aligning in the reverse direction
rT Number of reads with the T nucleotide at this position aligning in the reverse direction
aA Same as "A", but this time only showing reads that were accepted
aC Same as "C", but this time only showing reads that were accepted
aG Same as "G", but this time only showing reads that were accepted
aT Same as "T", but this time only showing reads that were accepted
arA Same as "rA", but this time only showing reads that were accepted
arC Same as "rC", but this time only showing reads that were accepted
arG Same as "rG", but this time only showing reads that were accepted
arT Same as "rT", but this time only showing reads that were accepted
cov Total number of reads aligning at this postion
accept Total number of accepted reads aligning at this position
iA Number of A insertions after this position in the positive direction
iC Number of C insertions after this position in the positive direction
iG Number of G insertions after this position in the positive direction
iT Number of T insertions after this position in the positive direction
irA Number of A insertions after this position in the reverse direction
irC Number of C insertions after this position in the reverse direction
irG Number of G insertions after this position in the reverse direction
irT Number of T insertions after this position in the reverse direction
sumI Total number of reads with insertions after this position
D Total number of reads with deletions at this position aligning in the forward direction
rD Total number of reads with deletions at this position aligning in the reverse direction
sumD Total number of deletions at this position


"Accpeted" reads are reads which pass the NQS thresholds specified (i.e. MIN_QUAL, NQ).

Call SNPs

*CallPolymorphismsFromMap*

CallPolymorphismsFromMap IN=your_reads_file.coverage_map MIN_RATIO=0.66 MIN_READS=2 NEED_RC=True

This program will parse the coverage map file to return SNP calls. You will want to modify some of these parameters depending on your data.

MIN_RATIO= This is the ratio of reads that must be different from the reference for a SNP to be called. A value of 0.66 is appropriate for haplotype data, 0.10 for diploid data, and 0.00 for single read SNP calling
MIN_READS= This specifies how many reads must be different from the reference for a SNP to be called. A value of 2 is appropriate multi-read SNP calling. For single-read SNP calling you will want to set this value to 1.
NEED_RC= This toggles the requirement for reads to be seen in both alignment directions before a SNP is called. This should be set to true for multi-read SNP calling, and false for single-read SNP calling.

Output is written to STDOUT.

Here is a example line of the SNP calling output, and what each section means:
<PRE>0 3076 G A 0.0194% 15376/ 66 acc (335/ 8 rej): A: 3/ 0 ( 0/ 0) C: 5/ 0 ( 1/ 0) G: 15367/ 66 (334/ 8) T: 1/ 0 ( 0/ 0) iA: 0/ 0 iC: 0/ 0 iG: 0/ 0 iT: 0/ 0 Del: 0/0 F</PRE>

0= reference number
3076= reference base (dash means deletion)
G= nucleotide in reference (dash means deletion)
A= nucleotide in sample (454 data)
0.0194%= percent of reads disagreeing with the reference
15376/ 66 acc= total forward/total reverse reads accepted
(335/ 8 rej):= total forward/total reverse reads rejected
A: 3/ 0 ( 0/ 0)= total forward reads calling A/ total reverese reads call A accepted (rejected reads calling A: fw/rc)
C: 5/ 0 ( 1/ 0)= same as above for C calls
G: 15367/ 66 (334/ 8)= same as above for G calls
T: 1/ 0 ( 0/ 0)= same as above for T calls
iA: 0/ 0= forward/reverse reads indicating an A inserted after this position
iC: 0/ 0= same as above for C insertions
iG: 0/ 0= same as above for G insertions
iT: 0/ 0= same as above for T insertions
Del: 0/0= forward/reverse reads indicating a deletion of this base
F= true or false. This is only informative if a truth file is given to the program