Basic walkers

From GSA

(Redirected from Built-in walkers)
Jump to: navigation, search

Contents

Print reads

Outputs all reads in the input stream, either to the console or to another BAM file.

Input

Reads and reference. Optionally, reads can be filtered by platform (--platform) or by maximum allowed read length (--maxReadLength).

Output

The reads from the input BAM file can be written to the console (default), to an output file in SAM format (--out), or to another BAM file (--outputBamFile).

Example: Printing reads to the console

java -jar GenomeAnalysisTK.jar -T PrintReads -I resources/exampleBAM.bam -R resources/exampleFASTA.fasta 

Example: Merging BAM files

The GATK can dynamically merge BAM files as it traverses them. To merge BAM files, specify one -I argument for each BAM file to merge, and the --outputBamFile argument to redirect output.

java -jar GenomeAnalysisTK.jar -T PrintReads -I <first>.bam -I <next>.bam -R <your>.fasta --outputBamFile <merged>.bam


Pileup

Prints the alignment in the pileup format. In the pileup format, each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities. Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column. At this column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, 'ACGTN' for a mismatch on the forward strand and 'acgtn' for a mismatch on the reverse strand.

A pattern '\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence. Similarly, a pattern '-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference. Also at the read base column, a symbol '^' marks the start of a read segment which is a contiguous subsequence on the read separated by 'N/S/H' CIGAR operations. The ASCII of the character following '^' minus 33 gives the mapping quality. A symbol '$' marks the end of a read segment.

Input

Reads, reference, and optionally reference-ordered data of any type. Several options exist to control the formatting and verbosity of output: --alwaysShowSecondBase and --showIndelPileups.

Output

A pileup of the input data, plus any reference-ordered data available at that locus.

Example

java -jar GenomeAnalysisTK.jar -I resources/exampleBAM.bam -R resources/exampleFASTA.fasta -T Pileup 
chr1 200 a A B ! 
chr1 201 c C B ! 
chr1 202 c C B ! 
chr1 203 c C B ! 
chr1 204 t T C ! 
chr1 205 a A @ ! 
chr1 206 a A C ! 
chr1 207 c C ? ! 
chr1 208 c C A ! 
chr1 209 c C A ! 
chr1 210 t T B ! 
chr1 211 a A C ! 
chr1 212 a A B ! 
chr1 213 c C B ! 
chr1 214 c C < ! 
chr1 215 c C 6 ! 
chr1 216 t T 3 ! 
Personal tools