This tool provides simple, powerful read clipping capabilities to remove low quality strings of bases, sections of reads, and reads containing user-provided sequences.
It allows the user to clip bases in reads with poor quality scores, that match particular sequences, or that were generated by particular machine cycles.
Any number of BAM files.
A new BAM file containing all of the reads from the input BAMs with the user-specified clipping operation applied to each read.
Number of examined reads 13
Number of clipped reads 13
Percent of clipped reads 100.00
Number of examined bases 988
Number of clipped bases 126
Percent of clipped bases 12.75
Number of quality-score clipped bases 126
Number of range clipped bases 0
Number of sequence clipped bases 0
314KGAAXX090507:1:19:1420:1123#0 16 chrM 3116 29 76M * * *
TAGGACCCGGGCCCCCCTCCCCAATCCTCCAACGCATATAGCGGCCGCGCCTTCCCCCGTAAATGATATCATCTCA
#################4?6/?2135;;;'1/=/<'B9;12;68?A79@,@==@9?=AAA3;A@B;A?B54;?ABA
If we are clipping reads with -QT 10 and -CR WRITE_NS, we get:
314KGAAXX090507:1:19:1420:1123#0 16 chrM 3116 29 76M * * *
NNNNNNNNNNNNNNNNNTCCCCAATCCTCCAACGCATATAGCGGCCGCGCCTTCCCCCGTAAATGATATCATCTCA
#################4?6/?2135;;;'1/=/<'B9;12;68?A79@,@==@9?=AAA3;A@B;A?B54;?ABA
Whereas with -CR WRITE_Q0S:
314KGAAXX090507:1:19:1420:1123#0 16 chrM 3116 29 76M * * *
TAGGACCCGGGCCCCCCTCCCCAATCCTCCAACGCATATAGCGGCCGCGCCTTCCCCCGTAAATGATATCATCTCA
!!!!!!!!!!!!!!!!!4?6/?2135;;;'1/=/<'B9;12;68?A79@,@==@9?=AAA3;A@B;A?B54;?ABA
Or -CR SOFTCLIP_BASES:
314KGAAXX090507:1:19:1420:1123#0 16 chrM 3133 29 17S59M * * *
TAGGACCCGGGCCCCCCTCCCCAATCCTCCAACGCATATAGCGGCCGCGCCTTCCCCCGTAAATGATATCATCTCA
#################4?6/?2135;;;'1/=/<'B9;12;68?A79@,@==@9?=AAA3;A@B;A?B54;?ABA
-T ClipReads -I my.bam -I your.bam -o my_and_your.clipped.bam -R Homo_sapiens_assembly18.fasta \
-XF seqsToClip.fasta -X CCCCC -CT "1-5,11-15" -QT 10
This Read Filter is automatically applied to the data by the Engine before processing by ClipReads.
The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).
This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.
| Name | Type | Default value | Summary |
|---|---|---|---|
| Optional | |||
| --clipRepresentation | ClippingRepresentation | WRITE_NS | How should we actually clip the bases? |
| --clipSequence | String[] | NA | Remove sequences within reads matching this sequence |
| --clipSequencesFile | String | NA | Remove sequences within reads matching the sequences in this FASTA file |
| --cyclesToTrim | String | NA | String indicating machine cycles to clip from the reads |
| --out | StingSAMFileWriter | stdout | Write BAM output here |
| --outputStatistics | PrintStream | NA | Write output statistics to this file |
| --qTrimmingThreshold | int | -1 | If provided, the Q-score clipper will be applied |
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
How should we actually clip the bases?. The different values for this argument determines how ClipReads applies clips to the reads. This can range
from writing Ns over the clipped bases to hard clipping away the bases from the BAM.
The --clipRepresentation argument is an enumerated type (ClippingRepresentation), which can have one of the following values:
Remove sequences within reads matching this sequence. Clips bases from the reads matching the provided SEQ. Can be provided any number of times on the command line
Remove sequences within reads matching the sequences in this FASTA file. Reads the sequences in the provided FASTA file, and clip any bases that exactly match any of the sequences in the file.
String indicating machine cycles to clip from the reads. Clips machine cycles from the read. Accepts a string of ranges of the form start1-end1,start2-end2, etc. For each start/end pair, removes bases in machine cycles from start to end, inclusive. These are 1-based values (positions). For example, 1-5,10-12 clips the first 5 bases, and then three bases at cycles 10, 11, and 12.
Write BAM output here. The output SAM/BAM file will be written here
Write output statistics to this file. If provided, ClipReads will write summary statistics about the clipping operations applied to the reads to this file.
If provided, the Q-score clipper will be applied. If a value > 0 is provided, then the quality score based read clipper will be applied to the reads using this quality score threshold.
See also Guide Index | Technical Documentation Index | Support Forum
GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.