Tool Documentation Index 3.1-1-g07a4bf8


Name Summary
CommandLineGATK All command line parameters accepted by all tools in the GATK.

Name Summary
AnalyzeCovariates Tool to analyze and evaluate base recalibration ables.
BaseCoverageDistribution Simple walker to plot the coverage distribution per base
CallableLoci Emits a data file containing information about callable, uncallable, poorly mapped, and other parts of the genome

CheckAlignment Validates consistency of the aligner interface
CheckPileup Compare GATK's internal pileup to a reference Samtools pileup
CompareCallableLoci Test routine for new VariantContext object
CountBases Walks over the input data set, calculating the number of bases seen for diagnostic purposes.
CountIntervals Count contiguous regions in an interval list.
CountLoci Walks over the input data set, calculating the total number of covered loci for diagnostic purposes.
CountMales Walks over the input data set, calculating the number of reads seen from male samples for diagnostic purposes.
CountRODs Prints out counts of the number of reference ordered data objects encountered.
CountRODsByRef Prints out counts of the number of reference ordered data objects encountered along the reference.
CountReadEvents Walks over the input data set, counting the number of read events (from the CIGAR operator)
CountReads Walks over the input data set, calculating the number of reads seen for diagnostic purposes.
CountTerminusEvent Walks over the input data set, counting the number of reads ending in insertions/deletions or soft-clips
CoveredByNSamplesSites Print intervals file with all the variant sites for which most of the samples have good coverage
DepthOfCoverage Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library
DiagnoseTargets Analyzes coverage distribution and validates read mates for a given interval and sample.
DiffObjects A generic engine for comparing tree-structured objects
ErrorRatePerCycle Compute the read error rate per position
FastaStats Calculate basic statistics about the reference sequence itself
FindCoveredIntervals Outputs a list of intervals that are covered above a given threshold.
FlagStat A reimplementation of the 'samtools flagstat' subcommand in the GATK
GCContentByInterval Walks along reference and calculates the GC content for each interval.
Pileup Emulates the samtools pileup command to print aligned reads
PrintRODs Prints out all of the RODs in the input data set.
QCRef Quality control for the reference fasta
QualifyMissingIntervals Walks along reference and calculates a few metrics for each interval.
ReadClippingStats Read clipping statistics for all reads.
ReadGroupProperties Emits a GATKReport containing read group, sample, library, platform, center, sequencing data, paired end status, simple read type name (e.g.
ReadLengthDistribution Outputs the read lengths of all the reads in a file.
SimulateReadsForVariants Generates simulated reads for variants

Name Summary
BaseRecalibrator First pass of the base quality score recalibration -- Generates recalibration table based on various user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context).
ClipReads Read clipping based on quality, position or sequence matching
IndelRealigner Performs local realignment of reads to correct misalignments due to the presence of indels.
LeftAlignIndels Left-aligns indels from reads in a bam file.
PrintReads Renders, in SAM/BAM format, all reads from the input data set in the order in which they appear in the input file.
ReadAdaptorTrimmer Utility tool to blindly strip base adaptors.
RealignerTargetCreator Emits intervals for the Local Indel Realigner to target for realignment.
SplitNCigarReads Splits reads that contain Ns in their cigar string (e.g.
SplitSamFile Divides the input data set into separate BAM files, one for each sample in the input data set.

Name Summary
ApplyRecalibration Applies cuts to the input vcf file (by adding filter lines) to achieve the desired novel truth sensitivity levels which were specified during VariantRecalibration
BeagleOutputToVCF Takes files produced by Beagle imputation engine and creates a vcf with modified annotations.
GATKPaperGenotyper A simple Bayesian genotyper, that outputs a text based call format.
HaplotypeCaller Call SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region.
PhaseByTransmission Computes the most likely genotype combination and phases trios and parent/child pairs
ProduceBeagleInput Converts the input VCF into a format accepted by the Beagle imputation/analysis program.
ReadBackedPhasing Walks along all variant ROD loci, caching a user-defined window of VariantContext sites, and then finishes phasing them when they go out of range (using upstream and downstream reads).
UnifiedGenotyper A variant caller which unifies the approaches of several disparate callers -- Works for single-sample and multi-sample data.
VariantRecalibrator Create a Gaussian mixture model by looking at the annotations values over a high quality subset of the input call set and then evaluate all input variants.
VariantsToBeagleUnphased Produces an input file to Beagle imputation engine, listing unphased, hard-called genotypes for a single sample in input variant file.

Name Summary
CalculateGenotypePosteriors Calculates genotype posterior likelihoods given panel data
CatVariants Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples
CombineGVCFs Combines any number of gVCF files that were produced by the Haplotype Caller into a single joint gVCF file.
CombineVariants Combines VCF records from different sources.
FilterLiftedVariants Filters a lifted-over VCF file for ref bases that have been changed.
GenotypeConcordance Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site allele overlaps) between two callsets
GenotypeGVCFs Genotypes any number of gVCF files that were produced by the Haplotype Caller into a single joint VCF file.
HaplotypeResolver Haplotype-based resolution of variants in 2 different eval files.
LeftAlignAndTrimVariants Left-aligns indels from a variants file.
LiftoverVariants Lifts a VCF file over from one build to another.
RandomlySplitVariants Takes a VCF file, randomly splits variants into two different sets, and outputs 2 new VCFs with the results.
RegenotypeVariants Regenotypes the variants from a VCF.
SelectHeaders Selects headers from a VCF source.
SelectVariants Selects variants from a VCF source.
VariantAnnotator Annotates variant calls with context information.
VariantEval General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
VariantFiltration Filters variant calls using a number of user-selectable, parameterizable criteria.
VariantsToAllelicPrimitives Takes alleles from a variants file and breaks them up (if possible) into more basic/primitive alleles.
VariantsToBinaryPed Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam)
VariantsToTable Emits specific fields from a VCF file to a tab-deliminated table
VariantsToVCF Converts variants from other file formats to VCF format.

Name Summary
ListAnnotations Utility program to print a list of available annotations

Name Summary
FastaAlternateReferenceMaker Generates an alternative reference sequence over the specified interval.
FastaReferenceMaker Renders a new reference in FASTA format consisting of only those loci provided in the input data set.

Name Summary
GenotypeAndValidate Genotypes a dataset and validates the calls of another dataset using the Unified Genotyper.
ValidateVariants Validates a VCF file with an extra strict set of criteria.
ValidationAmplicons Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and subsequent validation
ValidationSiteSelector Randomly selects VCF records according to specified options.
VariantValidationAssessor Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes)

GATK Engine arguments that filter or transfer incoming SAM/BAM data files

Name Summary
BadCigarFilter Filter out reads with wonky cigar strings.
BadMateFilter Filter out reads whose mate maps to a different contig.
DuplicateReadFilter Filter out duplicate reads.
FailsVendorQualityCheckFilter Filter out reads that fail the vendor quality check.
HCMappingQualityFilter Filter out reads with low mapping qualities.
LibraryReadFilter Only use reads from the specified library
MalformedReadFilter Filter out malformed reads.
MappingQualityFilter Filter out reads with low mapping qualities.
MappingQualityUnavailableFilter Filter out mapping quality zero reads.
MappingQualityZeroFilter Filter out mapping quality zero reads.
MateSameStrandFilter Filter out reads that are not paired, have their mate unmapped, are duplicates, fail vendor quality check or both mate and read are in the same strand.
MaxInsertSizeFilter Filter out reads that exceed a given max insert size
MissingReadGroupFilter Filter out reads without read groups.
NoOriginalQualityScoresFilter Filter out reads that don't have base an original quality quality score tag (usually added by BQSR)
NotPrimaryAlignmentFilter Filter out duplicate reads.
Platform454Filter Filter out 454 reads.
PlatformFilter Filter out PL matching reads.
PlatformUnitFilter Filter out reads that have blacklisted platform unit tags.
ReadGroupBlackListFilter Removes records matching the read group tag and exact match string.
ReadLengthFilter Filters out reads whose length is >= some value or < some value.
ReadNameFilter Filter out all reads except those with this read name
ReadStrandFilter Filters out reads whose strand is negative or positive
ReassignMappingQualityFilter A read filter (transformer) that sets all reads mapping quality to a given value.
ReassignOneMappingQualityFilter A read filter (transformer) that changes a given read mapping quality to a different value.
SampleFilter Filter out all reads except those with this sample
SingleReadGroupFilter Only use reads from the specified read group.
UnmappedReadFilter Filter out unmapped reads.

Tribble codecs for reading reference ordered data (ROD) files such as VCF or BED

Name Summary
BeagleCodec Codec for Beagle imputation engine
BedTableCodec The standard table codec that expects loci as contig start stop, not contig:start-stop
RawHapMapCodec A codec for the file types produced by the HapMap consortium
RefSeqCodec Allows for reading in RefSeq information
SAMPileupCodec Decoder for SAM pileup data.
SAMReadCodec Decodes a simple SAM text string.
TableCodec Reads tab deliminated tabular text files

Annotations available to VariantAnnotator and the variant callers (some restrictions apply)

Name Summary
AlleleBalance Allele balance across all samples
AlleleBalanceBySample Allele balance per sample
BaseCounts Count of A, C, G, T bases across all samples
BaseQualityRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities
ChromosomeCounts Allele counts and frequency for each ALT allele and total number of alleles in called genotypes
ClippingRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for reads with clipped bases
Coverage Total (unfiltered) depth over all samples.
DepthPerAlleleBySample The depth of coverage of each allele per sample
DepthPerSampleHC The depth of coverage for informative reads for each sample.
FisherStrand Phred-scaled p-value using Fisher's Exact Test to detect strand bias
GCContent GC content of the reference around the given site
HaplotypeScore Consistency of the site with two (and only two) segregating haplotypes.
HardyWeinberg Hardy-Weinberg test for disequilibrium
HomopolymerRun Largest contiguous homopolymer run of the variant allele
InbreedingCoeff Likelihood-based (using PL field) test for the inbreeding among samples.
LikelihoodRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test contrasting the likelihoods of reads to their most likely haplotypes.
LowMQ Triplet annotation: fraction of MAQP == 0, MAPQ < 10, and count of all mapped reads
MVLikelihoodRatio Likelihood of being a Mendelian Violation
MappingQualityRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities
MappingQualityZero Total count across all samples of mapping quality zero reads
MappingQualityZeroBySample Count for each sample of mapping quality zero reads
NBaseCount The number of N bases, counting only SOLiD data
QualByDepth Variant confidence (from the QUAL field) / unfiltered depth of non-reference samples.
RMSMappingQuality Root Mean Square of the mapping quality of the reads across all samples.
ReadPosRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele
SampleList List all of the polymorphic samples.
SnpEff A set of genomic annotations based on the output of the SnpEff variant effect predictor tool
SpanningDeletions Fraction of reads containing spanning deletions at this site
StrandBiasBySample Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias User: rpoplin Date: 8/28/13
StrandOddsRatio Symmetric Odds Ratio to detect strand bias
TandemRepeatAnnotator Annotates variants that are composed of tandem repeats
TransmissionDisequilibriumTest Wittkowski transmission disequilibrium test
VariantType Assigns a roughly correct category of the variant type (SNP, MNP, insertion, deletion, etc.)

Name Summary
ErrorThrowing A walker that simply throws errors.

Errors caused by incorrect user behavior, such as bad files, bad arguments, etc.

Name Summary
ArgumentException Generic class for handling misc parsing exceptions.
ArgumentValueOutOfRangeException
ArgumentsAreMutuallyExclusiveException An exception indicating that mutually exclusive options have been passed in the same command line.
DynamicClassResolutionException Class for handling common failures of dynamic class resolution
InvalidArgumentException An exception for undefined arguments.
InvalidArgumentValueException An exception for values whose format is invalid.
MissingArgumentException An exception indicating that some required arguments are missing.
MissingArgumentValueException Specifies that a value was missing when attempting to populate an argument.
TooManyValuesForArgumentException An exception indicating that too many values have been provided for the given argument.
UnknownEnumeratedValueException An exception for when an argument doesn't match an of the enumerated options for that var type
UnmatchedArgumentException An exception for values that can't be mated with any argument.
UserException Represents the common user errors detected by Sting / GATK Root class for all GATK user errors, as well as the container for errors themselves

See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.1-1-g07a4bf8 built at 2014/03/18 07:00:36.