| Name | Summary |
|---|---|
| CommandLineGATK | All command line parameters accepted by all tools in the GATK. |
| Name | Summary |
|---|---|
| BaseCoverageDistribution | Simple walker to plot the coverage distribution per base |
| CallableLoci | Emits a data file containing information about callable, uncallable, poorly mapped, and other parts of the genome |
| CheckAlignment | Validates consistency of the aligner interface |
| CheckPileup | At every locus in the input set, compares the pileup data (reference base, aligned base from each overlapping read, and quality score) to the reference pileup data generated by samtools. |
| CompareBAM | Given two BAMs with different read groups, it compares them based on ReduceReads metrics. |
| CompareCallableLoci | Test routine for new VariantContext object |
| CountBases | Walks over the input data set, calculating the number of bases seen for diagnostic purposes. |
| CountIntervals | Count contiguous regions in an interval list. |
| CountLoci | Walks over the input data set, calculating the total number of covered loci for diagnostic purposes. |
| CountMales | Walks over the input data set, calculating the number of reads seen from male samples for diagnostic purposes. |
| CountRODs | Prints out counts of the number of reference ordered data objects encountered. |
| CountRODsByRef | Prints out counts of the number of reference ordered data objects encountered along the reference. |
| CountReadEvents | Walks over the input data set, counting the number of read events (from the CIGAR operator) |
| CountReads | Walks over the input data set, calculating the number of reads seen for diagnostic purposes. |
| CountTerminusEvent | Walks over the input data set, counting the number of reads ending in insertions/deletions or soft-clips |
| CoveredByNSamplesSites | Print intervals file with all the variant sites for which most of the samples have good coverage |
| DepthOfCoverage | Toolbox for assessing sequence coverage by a wide array of metrics, partitioned by sample, read group, or library |
| DiagnoseTargets | Analyzes coverage distribution and validates read mates for a given interval and sample. |
| DiffObjects | A generic engine for comparing tree-structured objects |
| ErrorRatePerCycle | Computes the read error rate per position in read (in the original 5'->3' orientation that the read had coming off the machine) Emits a GATKReport containing readgroup, cycle, mismatches, counts, qual, and error rate for each read group in the input BAMs FOR ONLY THE FIRST OF PAIR READS. |
| FastaStats | Calculate basic statistics about the reference sequence itself |
| FindCoveredIntervals | Outputs a list of intervals that are covered above a given threshold. |
| FlagStat | A reimplementation of the 'samtools flagstat' subcommand in the GATK |
| GCContentByInterval | Walks along reference and calculates the GC content for each interval. |
| Pileup | Emulates the samtools pileup command to print aligned reads |
| PrintRODs | Prints out all of the RODs in the input data set. |
| QCRef | Quality control for the reference fasta |
| ReadClippingStats | Walks over the input reads, printing out statistics about the read length, number of clipping events, and length of the clipping to the output stream. |
| ReadGroupProperties | Emits a GATKReport containing read group, sample, library, platform, center, sequencing data, paired end status, simple read type name (e.g. |
| ReadLengthDistribution | Outputs the read lengths of all the reads in a file. |
| RecalibrationPerformance | Evaluate the performance of the base recalibration process |
| Name | Summary |
|---|---|
| BaseRecalibrator | First pass of the base quality score recalibration -- Generates recalibration table based on various user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context). |
| ClipReads | This tool provides simple, powerful read clipping capabilities to remove low quality strings of bases, sections of reads, and reads containing user-provided sequences. |
| IndelRealigner | Performs local realignment of reads to correct misalignments due to the presence of indels. |
| LeftAlignIndels | Left-aligns indels from reads in a bam file. |
| PrintReads | Renders, in SAM/BAM format, all reads from the input data set in the order in which they appear in the input file. |
| ReadAdaptorTrimmer | Utility tool to blindly strip base adaptors. |
| RealignerTargetCreator | Emits intervals for the Local Indel Realigner to target for realignment. |
| ReduceReads | Reduces the BAM file using read based compression that keeps only essential information for variant calling |
| SplitSamFile | Divides the input data set into separate BAM files, one for each sample in the input data set. |
| Name | Summary |
|---|---|
| ApplyRecalibration | Applies cuts to the input vcf file (by adding filter lines) to achieve the desired novel truth sensitivity levels which were specified during VariantRecalibration |
| BeagleOutputToVCF | Takes files produced by Beagle imputation engine and creates a vcf with modified annotations. |
| GATKPaperGenotyper | A simple Bayesian genotyper, that outputs a text based call format. |
| HaplotypeCaller | Call SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. |
| PhaseByTransmission | Computes the most likely genotype combination and phases trios and parent/child pairs |
| ProduceBeagleInput | Converts the input VCF into a format accepted by the Beagle imputation/analysis program. |
| ReadBackedPhasing | Walks along all variant ROD loci, caching a user-defined window of VariantContext sites, and then finishes phasing them when they go out of range (using upstream and downstream reads). |
| UnifiedGenotyper | A variant caller which unifies the approaches of several disparate callers -- Works for single-sample and multi-sample data. |
| VariantRecalibrator | Create a Gaussian mixture model by looking at the annotations values over a high quality subset of the input call set and then evaluate all input variants. |
| VariantsToBeagleUnphased | Produces an input file to Beagle imputation engine, listing unphased, hard-called genotypes for a single sample in input variant file. |
| Name | Summary |
|---|---|
| CatVariants | Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples |
| CombineVariants | Combines VCF records from different sources. |
| FilterLiftedVariants | Filters a lifted-over VCF file for ref bases that have been changed. |
| GenotypeConcordance | Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site allele overlaps) between two callsets |
| HaplotypeResolver | Haplotype-based resolution of variants in 2 different eval files. |
| LeftAlignAndTrimVariants | Left-aligns indels from a variants file. |
| LiftoverVariants | Lifts a VCF file over from one build to another. |
| RandomlySplitVariants | Takes a VCF file, randomly splits variants into two different sets, and outputs 2 new VCFs with the results. |
| RegenotypeVariants | Regenotypes the variants from a VCF. |
| SelectHeaders | Selects headers from a VCF source. |
| SelectVariants | Selects variants from a VCF source. |
| VariantAnnotator | Annotates variant calls with context information. |
| VariantEval | General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more) |
| VariantFiltration | Filters variant calls using a number of user-selectable, parameterizable criteria. |
| VariantsToAllelicPrimitives | Takes alleles from a variants file and breaks them up (if possible) into more basic/primitive alleles. |
| VariantsToBinaryPed | Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam) |
| VariantsToTable | Emits specific fields from a VCF file to a tab-deliminated table |
| VariantsToVCF | Converts variants from other file formats to VCF format. |
| Name | Summary |
|---|---|
| ListAnnotations | Utility program to print a list of available annotations |
| Name | Summary |
|---|---|
| FastaAlternateReferenceMaker | Generates an alternative reference sequence over the specified interval. |
| FastaReferenceMaker | Renders a new reference in FASTA format consisting of only those loci provided in the input data set. |
| Name | Summary |
|---|---|
| GenotypeAndValidate | Genotypes a dataset and validates the calls of another dataset using the Unified Genotyper. |
| ValidateVariants | Validates a VCF file with an extra strict set of criteria. |
| ValidationAmplicons | Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and subsequent validation |
| ValidationSiteSelector | Randomly selects VCF records according to specified options. |
| VariantValidationAssessor | Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes) |
GATK Engine arguments that filter or transfer incoming SAM/BAM data files
| Name | Summary |
|---|---|
| BadCigarFilter | Filter out reads with wonky cigar strings. |
| BadMateFilter | Filter out reads whose mate maps to a different contig. |
| DuplicateReadFilter | Filter out duplicate reads. |
| FailsVendorQualityCheckFilter | Filter out reads that fail the vendor quality check. |
| HCMappingQualityFilter | Filter out reads with low mapping qualities. |
| MalformedReadFilter | Filter out malformed reads. |
| MappingQualityFilter | Filter out reads with low mapping qualities. |
| MappingQualityUnavailableFilter | Filter out mapping quality zero reads. |
| MappingQualityZeroFilter | Filter out mapping quality zero reads. |
| MateSameStrandFilter | Filter out reads that are not paired, have their mate unmapped, are duplicates, fail vendor quality check or both mate and read are in the same strand. |
| MaxInsertSizeFilter | Filter out reads that exceed a given max insert size |
| MissingReadGroupFilter | Filter out reads without read groups. |
| NoOriginalQualityScoresFilter | Filter out reads that don't have base an original quality quality score tag (usually added by BQSR) |
| NotPrimaryAlignmentFilter | Filter out duplicate reads. |
| Platform454Filter | Filter out 454 reads. |
| PlatformFilter | Filter out PL matching reads. |
| PlatformUnitFilter | Filter out reads that have blacklisted platform unit tags. |
| ReadGroupBlackListFilter | Removes records matching the read group tag and exact match string. |
| ReadLengthFilter | Filters out reads whose length is >= some value or < some value. |
| ReadNameFilter | Filter out all reads except those with this read name |
| ReadStrandFilter | Filters out reads whose strand is negative or positive |
| ReassignMappingQualityFilter | A read filter (transformer) that sets all reads mapping quality to a given value. |
| ReassignOneMappingQualityFilter | A read filter (transformer) that changes a given read mapping quality to a different value. |
| SampleFilter | Filter out all reads except those with this sample |
| SingleReadGroupFilter | Only use reads from the specified read group. |
| UnmappedReadFilter | Filter out unmapped reads. |
Tribble codecs for reading reference ordered data (ROD) files such as VCF or BED
| Name | Summary |
|---|---|
| BeagleCodec | Codec for Beagle imputation engine |
| BedTableCodec | The standard table codec that expects loci as contig start stop, not contig:start-stop |
| RawHapMapCodec | A codec for the file types produced by the HapMap consortium |
| RefSeqCodec | Allows for reading in RefSeq information |
| SAMPileupCodec | Decoder for SAM pileup data. |
| SAMReadCodec | Decodes a simple SAM text string. |
| TableCodec | Reads tab deliminated tabular text files |
| Name | Summary |
|---|---|
| AssessReducedQuals | Emits intervals in which the differences between the original and reduced bam quals are bigger epsilon (unless the quals of the reduced bam are above sufficient threshold) |
| DownsampleReadsQC |
Errors caused by incorrect user behavior, such as bad files, bad arguments, etc.
| Name | Summary |
|---|---|
| ArgumentException | Generic class for handling misc parsing exceptions. |
| ArgumentsAreMutuallyExclusiveException | An exception indicating that mutually exclusive options have been passed in the same command line. |
| DynamicClassResolutionException | Class for handling common failures of dynamic class resolution User: depristo Date: Sep 3, 2010 Time: 2:24:09 PM |
| InvalidArgumentException | An exception for undefined arguments. |
| InvalidArgumentValueException | An exception for values whose format is invalid. |
| MissingArgumentException | An exception indicating that some required arguments are missing. |
| MissingArgumentValueException | Specifies that a value was missing when attempting to populate an argument. |
| TooManyValuesForArgumentException | An exception indicating that too many values have been provided for the given argument. |
| UnknownEnumeratedValueException | An exception for when an argument doesn't match an of the enumerated options for that var type |
| UnmatchedArgumentException | An exception for values that can't be mated with any argument. |
| UserException | Represents the common user errors detected by Sting / GATK Root class for all GATK user errors, as well as the container for errors themselves User: depristo Date: Sep 3, 2010 Time: 2:24:09 PM |
Annotations available to VariantAnnotator and the variant callers (some restrictions apply)
| Name | Summary |
|---|---|
| AlleleBalance | The allele balance (fraction of ref bases over ref + alt bases) across all biallelic het-called samples |
| AlleleBalanceBySample | Allele balance per sample |
| BaseCounts | Count of A, C, G, T bases across all samples |
| BaseQualityRankSumTest | U-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities |
| ChromosomeCounts | Allele counts and frequency for each ALT allele and total number of alleles in called genotypes |
| ClippingRankSumTest | U-based z-approximation from the Mann-Whitney Rank Sum Test for reads with clipped bases |
| Coverage | Total (unfiltered) depth over all samples. |
| DepthPerAlleleBySample | The depth of coverage of each allele per sample |
| FisherStrand | Phred-scaled p-value using Fisher's Exact Test to detect strand bias |
| GCContent | GC content of the reference around the given site |
| HaplotypeScore | Consistency of the site with two (and only two) segregating haplotypes. |
| HardyWeinberg | Hardy-Weinberg test for disequilibrium |
| HomopolymerRun | Largest contiguous homopolymer run of the variant allele |
| InbreedingCoeff | Likelihood-based (using PL field) test for the inbreeding among samples. |
| LowMQ | Triplet annotation: fraction of MAQP == 0, MAPQ < 10, and count of all mapped reads |
| MVLikelihoodRatio | Likelihood of being a Mendelian Violation |
| MappingQualityRankSumTest | U-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities |
| MappingQualityZero | Total count across all samples of mapping quality zero reads |
| MappingQualityZeroBySample | Count for each sample of mapping quality zero reads |
| NBaseCount | The number of N bases, counting only SOLiD data |
| QualByDepth | Variant confidence (from the QUAL field) / unfiltered depth of non-reference samples. |
| RMSMappingQuality | Root Mean Square of the mapping quality of the reads across all samples. |
| ReadPosRankSumTest | U-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele |
| SampleList | List all of the polymorphic samples. |
| SnpEff | A set of genomic annotations based on the output of the SnpEff variant effect predictor tool |
| SpanningDeletions | Fraction of reads containing spanning deletions at this site |
| TandemRepeatAnnotator | Annotates variants that are composed of tandem repeats |
| TransmissionDisequilibriumTest | Wittkowski transmission disequilibrium test |
| VariantType | Assigns a roughly correct category of the variant type (SNP, MNP, insertion, deletion, etc.) |
See also Guide Index | Technical Documentation Index | Support Forum
GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.