Basic walkers
From GSA
Contents |
Print reads
Outputs all reads in the input stream, either to the console or to another BAM file.
Input
Reads and reference. Optionally, reads can be filtered by platform (--platform) or by maximum allowed read length (--maxReadLength).
Output
The reads from the input BAM file can be written to the console (default), to an output file in SAM format (--out), or to another BAM file (--outputBamFile).
Example: Printing reads to the console
java -jar GenomeAnalysisTK.jar -T PrintReads -I resources/exampleBAM.bam -R resources/exampleFASTA.fasta
Example: Merging BAM files
The GATK can dynamically merge BAM files as it traverses them. To merge BAM files, specify one -I argument for each BAM file to merge, and the --outputBamFile argument to redirect output.
java -jar GenomeAnalysisTK.jar -T PrintReads -I <first>.bam -I <next>.bam -R <your>.fasta --outputBamFile <merged>.bam
Pileup
Prints the alignment in the pileup format. In the pileup format, each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases, read qualities and alignment mapping qualities. Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column. At this column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, 'ACGTN' for a mismatch on the forward strand and 'acgtn' for a mismatch on the reverse strand.
A pattern '\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence. Similarly, a pattern '-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference. Also at the read base column, a symbol '^' marks the start of a read segment which is a contiguous subsequence on the read separated by 'N/S/H' CIGAR operations. The ASCII of the character following '^' minus 33 gives the mapping quality. A symbol '$' marks the end of a read segment.
Input
Reads, reference, and optionally reference-ordered data of any type. Several options exist to control the formatting and verbosity of output: --alwaysShowSecondBase and --showIndelPileups.
Output
A pileup of the input data, plus any reference-ordered data available at that locus.
Example
java -jar GenomeAnalysisTK.jar -I resources/exampleBAM.bam -R resources/exampleFASTA.fasta -T Pileup
chr1 200 a A B ! chr1 201 c C B ! chr1 202 c C B ! chr1 203 c C B ! chr1 204 t T C ! chr1 205 a A @ ! chr1 206 a A C ! chr1 207 c C ? ! chr1 208 c C A ! chr1 209 c C A ! chr1 210 t T B ! chr1 211 a A C ! chr1 212 a A B ! chr1 213 c C B ! chr1 214 c C < ! chr1 215 c C 6 ! chr1 216 t T 3 !
