Tool Documentation Index 3.3-0-g37228af

Name Summary
CommandLineGATK All command line parameters accepted by all tools in the GATK.

Name Summary
AnalyzeCovariates Tool to analyze and evaluate base recalibration tables.
BaseCoverageDistribution Simple walker to plot the coverage distribution per base
CallableLoci Emits a data file containing information about callable, uncallable, poorly mapped, and other parts of the genome

CheckPileup Compare GATK's internal pileup to a reference Samtools pileup
CompareCallableLoci Test routine for new VariantContext object
CountBases Walks over the input data set, calculating the number of bases seen for diagnostic purposes.
CountIntervals Count contiguous regions in an interval list.
CountLoci Walks over the input data set, calculating the total number of covered loci for diagnostic purposes.
CountMales Walks over the input data set, calculating the number of reads seen from male samples for diagnostic purposes.
CountRODs Prints out counts of the number of reference ordered data objects encountered.
CountRODsByRef Prints out counts of the number of reference ordered data objects encountered along the reference.
CountReadEvents Walks over the input data set, counting the number of read events (from the CIGAR operator)
CountReads Walks over the input data set, calculating the number of reads seen for diagnostic purposes.
CountTerminusEvent Walks over the input data set, counting the number of reads ending in insertions/deletions or soft-clips
CoveredByNSamplesSites Print intervals file with all the variant sites for which most of the samples have good coverage
DepthOfCoverage Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library
DiagnoseTargets Analyzes coverage distribution and validates read mates for a given interval and sample.
DiffObjects A generic engine for comparing tree-structured objects
ErrorRatePerCycle Compute the read error rate per position
FastaStats Calculate basic statistics about the reference sequence itself
FindCoveredIntervals Outputs a list of intervals that are covered above a given threshold.
FlagStat A reimplementation of the 'samtools flagstat' subcommand in the GATK
GCContentByInterval Walks along reference and calculates the GC content for each interval.
Pileup Emulates the samtools pileup command to print aligned reads
PrintRODs Prints out all of the RODs in the input data set.
QCRef Quality control for the reference fasta
QualifyMissingIntervals Walks along reference and calculates a few metrics for each interval.
ReadClippingStats Read clipping statistics for all reads.
ReadGroupProperties Emits a GATKReport containing read group, sample, library, platform, center, sequencing data, paired end status, simple read type name (e.g.
ReadLengthDistribution Outputs the read lengths of all the reads in a file.
SimulateReadsForVariants Generates simulated reads for variants

Name Summary
BaseRecalibrator First pass of the base quality score recalibration -- Generates recalibration table based on various user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context).
ClipReads Read clipping based on quality, position or sequence matching
IndelRealigner Performs local realignment of reads to correct misalignments due to the presence of indels.
LeftAlignIndels Left-aligns indels from reads in a bam file.
PrintReads Renders, in SAM/BAM format, all reads from the input data set in the order in which they appear in the input file.
ReadAdaptorTrimmer Utility tool to blindly strip base adaptors.
RealignerTargetCreator Emits intervals for the Local Indel Realigner to target for realignment.
SplitNCigarReads Splits reads that contain Ns in their cigar string (e.g.
SplitSamFile Divides the input data set into separate BAM files, one for each sample in the input data set.

Name Summary
ApplyRecalibration Applies cuts to the input vcf file (by adding filter lines) to achieve the desired novel truth sensitivity levels which were specified during VariantRecalibration
BeagleOutputToVCF Takes files produced by Beagle imputation engine and creates a vcf with modified annotations.
GenotypeGVCFs Genotypes any number of gVCF files that were produced by the Haplotype Caller into a single joint VCF file.
HaplotypeCaller Call SNPs and indels simultaneously via local re-assembly of haplotypes in an active region.
PhaseByTransmission Computes the most likely genotype combination and phases trios and parent/child pairs
ProduceBeagleInput Converts the input VCF into a format accepted by the Beagle imputation/analysis program.
ReadBackedPhasing Walks along all variant ROD loci, caching a user-defined window of VariantContext sites, and then finishes phasing them when they go out of range (using upstream and downstream reads).
UnifiedGenotyper A variant caller which unifies the approaches of several disparate callers -- Works for single-sample and multi-sample data.
VariantRecalibrator Create a Gaussian mixture model by looking at the annotations values over a high quality subset of the input call set and then evaluate all input variants.
VariantsToBeagleUnphased Produces an input file to Beagle imputation engine, listing unphased, hard-called genotypes for a single sample in input variant file.

Name Summary
CalculateGenotypePosteriors Calculates genotype posterior likelihoods given panel data
CatVariants Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples
CombineGVCFs Combines any number of gVCF files that were produced by the Haplotype Caller into a single joint gVCF file.
CombineVariants Combines VCF records from different sources.
FilterLiftedVariants Filters a lifted-over VCF file for ref bases that have been changed.
GenotypeConcordance Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site allele overlaps) between two callsets
HaplotypeResolver Haplotype-based resolution of variants in 2 different eval files.
LeftAlignAndTrimVariants Left-aligns indels from a variants file.
LiftoverVariants Lifts a VCF file over from one build to another.
RandomlySplitVariants Takes a VCF file, randomly splits variants into two different sets, and outputs 2 new VCFs with the results.
RegenotypeVariants Regenotypes the variants from a VCF.
SelectHeaders Selects headers from a VCF source.
SelectVariants Selects variants from a VCF source.
VariantAnnotator Annotates variant calls with context information.
VariantEval General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
VariantFiltration Filters variant calls using a number of user-selectable, parameterizable criteria.
VariantsToAllelicPrimitives Takes alleles from a variants file and breaks them up (if possible) into more basic/primitive alleles.
VariantsToBinaryPed Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam)
VariantsToTable Emits specific fields from a VCF file to a tab-deliminated table
VariantsToVCF Converts variants from other file formats to VCF format.

Name Summary
ListAnnotations Utility program to print a list of available annotations

Name Summary
FastaAlternateReferenceMaker Generates an alternative reference sequence over the specified interval.
FastaReferenceMaker Renders a new reference in FASTA format consisting of only those loci provided in the input data set.

Name Summary
GenotypeAndValidate Genotypes a dataset and validates the calls of another dataset using the Unified Genotyper.
ValidateVariants Validates a VCF file with an extra strict set of criteria.
ValidationSiteSelector Randomly selects VCF records according to specified options.
VariantValidationAssessor Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes)

GATK Engine arguments that filter or transfer incoming SAM/BAM data files

Name Summary
BadCigarFilter Filter out reads with wonky cigar strings.
BadMateFilter Filter out reads whose mate maps to a different contig.
DuplicateReadFilter Filter out duplicate reads.
FailsVendorQualityCheckFilter Filter out reads that fail the vendor quality check.
HCMappingQualityFilter Filter out reads with low mapping qualities.
LibraryReadFilter Only use reads from the specified library
MalformedReadFilter Filter out malformed reads.
MappingQualityFilter Filter out reads with low mapping qualities.
MappingQualityUnavailableFilter Filter out mapping quality zero reads.
MappingQualityZeroFilter Filter out mapping quality zero reads.
MateSameStrandFilter Filter out reads that are not paired, have their mate unmapped, are duplicates, fail vendor quality check or both mate and read are in the same strand.
MaxInsertSizeFilter Filter out reads that exceed a given max insert size
MissingReadGroupFilter Filter out reads without read groups.
NoOriginalQualityScoresFilter Filter out reads that don't have base an original quality quality score tag (usually added by BQSR)
NotPrimaryAlignmentFilter Filter out duplicate reads.
Platform454Filter Filter out 454 reads.
PlatformFilter Filter out PL matching reads.
PlatformUnitFilter Filter out reads that have blacklisted platform unit tags.
ReadGroupBlackListFilter Removes records matching the read group tag and exact match string.
ReadLengthFilter Filters out reads whose length is >= some value or < some value.
ReadNameFilter Filter out all reads except those with this read name
ReadStrandFilter Filters out reads whose strand is negative or positive
ReassignMappingQualityFilter A read filter (transformer) that sets all reads mapping quality to a given value.
ReassignOneMappingQualityFilter A read filter (transformer) that changes a given read mapping quality to a different value.
SampleFilter Filter out all reads except those with this sample
SingleReadGroupFilter Only use reads from the specified read group.
UnmappedReadFilter Filter out unmapped reads.

Tribble codecs for reading reference ordered data (ROD) files such as VCF or BED

Name Summary
BeagleCodec Codec for Beagle imputation engine
BedTableCodec The standard table codec that expects loci as contig start stop, not contig:start-stop
RawHapMapCodec A codec for the file types produced by the HapMap consortium
RefSeqCodec Allows for reading in RefSeq information
SAMPileupCodec Decoder for SAM pileup data.
SAMReadCodec Decodes a simple SAM text string.
TableCodec Reads tab deliminated tabular text files

Annotations available to VariantAnnotator and the variant callers (some restrictions apply)

Name Summary
AlleleBalance Allele balance across all samples
AlleleBalanceBySample Allele balance per sample
AlleleCountBySample Allele count and frequency expectation per sample Needs documentation
BaseCounts Count of A, C, G, T bases across all samples
BaseQualityRankSumTest Rank Sum Test of REF vs.
ChromosomeCounts Counts and frequency of alleles in called genotypes
ClippingRankSumTest Rank Sum Test for hard-clipped bases on REF vs.
Coverage Total depth of coverage per sample (in FORMAT) and over all samples (in INFO).
DepthPerAlleleBySample Depth of coverage of each allele per sample
DepthPerSampleHC Depth of informative coverage for each sample.
FisherStrand Strand bias estimated using Fisher's Exact Test
GCContent GC content of the reference around the given site
GenotypeSummaries Genotype summary statistics
HaplotypeScore Consistency of the site with strictly two segregating haplotypes
HardyWeinberg Hardy-Weinberg test for transmission disequilibrium
HomopolymerRun Largest contiguous homopolymer run of the variant allele
InbreedingCoeff Likelihood-based test for the inbreeding among samples
LikelihoodRankSumTest Rank Sum Test of per-read likelihoods of REF vs.
LowMQ Proportion of low quality reads
MVLikelihoodRatio Likelihood of being a Mendelian Violation
MappingQualityRankSumTest Rank Sum Test for mapping qualities of REF vs.
MappingQualityZero Count of all reads with MAPQ = 0 across all samples
MappingQualityZeroBySample Count of reads with mapping quality zero for each sample
NBaseCount Percentage of N bases
PossibleDeNovo Existence of a de novo mutation in at least one of the given families
QualByDepth Variant confidence normalized by unfiltered depth of variant samples
RMSMappingQuality Root Mean Square of the mapping quality of reads across all samples.
ReadPosRankSumTest Rank Sum Test for relative positioning of REF vs.
SampleList List of samples that are polymorphic at a given site
SnpEff Top effect from SnpEff functional predictions
SpanningDeletions Fraction of reads containing spanning deletions
StrandBiasBySample Number of forward and reverse reads that support REF and ALT alleles
StrandOddsRatio Strand bias estimated by the Symmetric Odds Ratio test
TandemRepeatAnnotator Tandem repeat unit composition and counts per allele
TransmissionDisequilibriumTest Wittkowski transmission disequilibrium test
VariantType General category of variant

Name Summary
ErrorThrowing A walker that simply throws errors.
GATKPaperGenotyper A simple Bayesian genotyper, that outputs a text based call format.

Errors caused by incorrect user behavior, such as bad files, bad arguments, etc.

Name Summary
ArgumentException Generic class for handling misc parsing exceptions.
ArgumentsAreMutuallyExclusiveException An exception indicating that mutually exclusive options have been passed in the same command line.
DynamicClassResolutionException Class for handling common failures of dynamic class resolution
InvalidArgumentException An exception for undefined arguments.
InvalidArgumentValueException An exception for values whose format is invalid.
MissingArgumentException An exception indicating that some required arguments are missing.
MissingArgumentValueException Specifies that a value was missing when attempting to populate an argument.
TooManyValuesForArgumentException An exception indicating that too many values have been provided for the given argument.
UnknownEnumeratedValueException An exception for when an argument doesn't match an of the enumerated options for that var type
UnmatchedArgumentException An exception for values that can't be mated with any argument.
UserException Represents the common user errors detected by GATK Root class for all GATK user errors, as well as the container for errors themselves

Return to top

See also Guide Index | Tool Documentation Index | Support Forum

GATK version 3.3-0-g37228af built at 2014/10/24 14:40:51.