Technical Documentation Index 2.5-2-gdb4546e


Name Summary
CommandLineGATK All command line parameters accepted by all tools in the GATK.

Name Summary
BaseCoverageDistribution Simple walker to plot the coverage distribution per base
CallableLoci Emits a data file containing information about callable, uncallable, poorly mapped, and other parts of the genome

CheckAlignment Validates consistency of the aligner interface
CheckPileup At every locus in the input set, compares the pileup data (reference base, aligned base from each overlapping read, and quality score) to the reference pileup data generated by samtools.
CompareBAM Given two BAMs with different read groups, it compares them based on ReduceReads metrics.
CompareCallableLoci Test routine for new VariantContext object
CountBases Walks over the input data set, calculating the number of bases seen for diagnostic purposes.
CountIntervals Count contiguous regions in an interval list.
CountLoci Walks over the input data set, calculating the total number of covered loci for diagnostic purposes.
CountMales Walks over the input data set, calculating the number of reads seen from male samples for diagnostic purposes.
CountRODs Prints out counts of the number of reference ordered data objects encountered.
CountRODsByRef Prints out counts of the number of reference ordered data objects encountered along the reference.
CountReadEvents Walks over the input data set, counting the number of read events (from the CIGAR operator)
CountReads Walks over the input data set, calculating the number of reads seen for diagnostic purposes.
CountTerminusEvent Walks over the input data set, counting the number of reads ending in insertions/deletions or soft-clips
CoveredByNSamplesSites Print intervals file with all the variant sites for which most of the samples have good coverage
DepthOfCoverage Toolbox for assessing sequence coverage by a wide array of metrics, partitioned by sample, read group, or library
DiagnoseTargets Analyzes coverage distribution and validates read mates for a given interval and sample.
DiffObjects A generic engine for comparing tree-structured objects
ErrorRatePerCycle Computes the read error rate per position in read (in the original 5'->3' orientation that the read had coming off the machine) Emits a GATKReport containing readgroup, cycle, mismatches, counts, qual, and error rate for each read group in the input BAMs FOR ONLY THE FIRST OF PAIR READS.
FastaStats Calculate basic statistics about the reference sequence itself
FindCoveredIntervals Outputs a list of intervals that are covered above a given threshold.
FlagStat A reimplementation of the 'samtools flagstat' subcommand in the GATK
GCContentByInterval Walks along reference and calculates the GC content for each interval.
Pileup Emulates the samtools pileup command to print aligned reads
PrintRODs Prints out all of the RODs in the input data set.
QCRef Quality control for the reference fasta
ReadClippingStats Walks over the input reads, printing out statistics about the read length, number of clipping events, and length of the clipping to the output stream.
ReadGroupProperties Emits a GATKReport containing read group, sample, library, platform, center, sequencing data, paired end status, simple read type name (e.g.
ReadLengthDistribution Outputs the read lengths of all the reads in a file.
RecalibrationPerformance Evaluate the performance of the base recalibration process

Name Summary
BaseRecalibrator First pass of the base quality score recalibration -- Generates recalibration table based on various user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context).
ClipReads This tool provides simple, powerful read clipping capabilities to remove low quality strings of bases, sections of reads, and reads containing user-provided sequences.
IndelRealigner Performs local realignment of reads to correct misalignments due to the presence of indels.
LeftAlignIndels Left-aligns indels from reads in a bam file.
PrintReads Renders, in SAM/BAM format, all reads from the input data set in the order in which they appear in the input file.
ReadAdaptorTrimmer Utility tool to blindly strip base adaptors.
RealignerTargetCreator Emits intervals for the Local Indel Realigner to target for realignment.
ReduceReads Reduces the BAM file using read based compression that keeps only essential information for variant calling
SplitSamFile Divides the input data set into separate BAM files, one for each sample in the input data set.

Name Summary
ApplyRecalibration Applies cuts to the input vcf file (by adding filter lines) to achieve the desired novel truth sensitivity levels which were specified during VariantRecalibration
BeagleOutputToVCF Takes files produced by Beagle imputation engine and creates a vcf with modified annotations.
GATKPaperGenotyper A simple Bayesian genotyper, that outputs a text based call format.
HaplotypeCaller Call SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region.
PhaseByTransmission Computes the most likely genotype combination and phases trios and parent/child pairs
ProduceBeagleInput Converts the input VCF into a format accepted by the Beagle imputation/analysis program.
ReadBackedPhasing Walks along all variant ROD loci, caching a user-defined window of VariantContext sites, and then finishes phasing them when they go out of range (using upstream and downstream reads).
UnifiedGenotyper A variant caller which unifies the approaches of several disparate callers -- Works for single-sample and multi-sample data.
VariantRecalibrator Create a Gaussian mixture model by looking at the annotations values over a high quality subset of the input call set and then evaluate all input variants.
VariantsToBeagleUnphased Produces an input file to Beagle imputation engine, listing unphased, hard-called genotypes for a single sample in input variant file.

Name Summary
CatVariants Concatenates VCF files of non-overlapped genome intervals, all with the same set of samples
CombineVariants Combines VCF records from different sources.
FilterLiftedVariants Filters a lifted-over VCF file for ref bases that have been changed.
GenotypeConcordance Genotype concordance (per-sample and aggregate counts and frequencies, NRD/NRS and site allele overlaps) between two callsets
HaplotypeResolver Haplotype-based resolution of variants in 2 different eval files.
LeftAlignAndTrimVariants Left-aligns indels from a variants file.
LiftoverVariants Lifts a VCF file over from one build to another.
RandomlySplitVariants Takes a VCF file, randomly splits variants into two different sets, and outputs 2 new VCFs with the results.
RegenotypeVariants Regenotypes the variants from a VCF.
SelectHeaders Selects headers from a VCF source.
SelectVariants Selects variants from a VCF source.
VariantAnnotator Annotates variant calls with context information.
VariantEval General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
VariantFiltration Filters variant calls using a number of user-selectable, parameterizable criteria.
VariantsToAllelicPrimitives Takes alleles from a variants file and breaks them up (if possible) into more basic/primitive alleles.
VariantsToBinaryPed Converts a VCF file to a binary plink Ped file (.bed/.bim/.fam)
VariantsToTable Emits specific fields from a VCF file to a tab-deliminated table
VariantsToVCF Converts variants from other file formats to VCF format.

Name Summary
ListAnnotations Utility program to print a list of available annotations

Name Summary
FastaAlternateReferenceMaker Generates an alternative reference sequence over the specified interval.
FastaReferenceMaker Renders a new reference in FASTA format consisting of only those loci provided in the input data set.

Name Summary
GenotypeAndValidate Genotypes a dataset and validates the calls of another dataset using the Unified Genotyper.
ValidateVariants Validates a VCF file with an extra strict set of criteria.
ValidationAmplicons Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and subsequent validation
ValidationSiteSelector Randomly selects VCF records according to specified options.
VariantValidationAssessor Annotates a validation (from Sequenom for example) VCF with QC metrics (HW-equilibrium, % failed probes)

GATK Engine arguments that filter or transfer incoming SAM/BAM data files

Name Summary
BadCigarFilter Filter out reads with wonky cigar strings.
BadMateFilter Filter out reads whose mate maps to a different contig.
DuplicateReadFilter Filter out duplicate reads.
FailsVendorQualityCheckFilter Filter out reads that fail the vendor quality check.
HCMappingQualityFilter Filter out reads with low mapping qualities.
MalformedReadFilter Filter out malformed reads.
MappingQualityFilter Filter out reads with low mapping qualities.
MappingQualityUnavailableFilter Filter out mapping quality zero reads.
MappingQualityZeroFilter Filter out mapping quality zero reads.
MateSameStrandFilter Filter out reads that are not paired, have their mate unmapped, are duplicates, fail vendor quality check or both mate and read are in the same strand.
MaxInsertSizeFilter Filter out reads that exceed a given max insert size
MissingReadGroupFilter Filter out reads without read groups.
NoOriginalQualityScoresFilter Filter out reads that don't have base an original quality quality score tag (usually added by BQSR)
NotPrimaryAlignmentFilter Filter out duplicate reads.
Platform454Filter Filter out 454 reads.
PlatformFilter Filter out PL matching reads.
PlatformUnitFilter Filter out reads that have blacklisted platform unit tags.
ReadGroupBlackListFilter Removes records matching the read group tag and exact match string.
ReadLengthFilter Filters out reads whose length is >= some value or < some value.
ReadNameFilter Filter out all reads except those with this read name
ReadStrandFilter Filters out reads whose strand is negative or positive
ReassignMappingQualityFilter A read filter (transformer) that sets all reads mapping quality to a given value.
ReassignOneMappingQualityFilter A read filter (transformer) that changes a given read mapping quality to a different value.
SampleFilter Filter out all reads except those with this sample
SingleReadGroupFilter Only use reads from the specified read group.
UnmappedReadFilter Filter out unmapped reads.

Tribble codecs for reading reference ordered data (ROD) files such as VCF or BED

Name Summary
BeagleCodec Codec for Beagle imputation engine
BedTableCodec The standard table codec that expects loci as contig start stop, not contig:start-stop
RawHapMapCodec A codec for the file types produced by the HapMap consortium
RefSeqCodec Allows for reading in RefSeq information
SAMPileupCodec Decoder for SAM pileup data.
SAMReadCodec Decodes a simple SAM text string.
TableCodec Reads tab deliminated tabular text files

Name Summary
AssessReducedQuals Emits intervals in which the differences between the original and reduced bam quals are bigger epsilon (unless the quals of the reduced bam are above sufficient threshold)
DownsampleReadsQC

Errors caused by incorrect user behavior, such as bad files, bad arguments, etc.

Name Summary
ArgumentException Generic class for handling misc parsing exceptions.
ArgumentsAreMutuallyExclusiveException An exception indicating that mutually exclusive options have been passed in the same command line.
DynamicClassResolutionException Class for handling common failures of dynamic class resolution User: depristo Date: Sep 3, 2010 Time: 2:24:09 PM
InvalidArgumentException An exception for undefined arguments.
InvalidArgumentValueException An exception for values whose format is invalid.
MissingArgumentException An exception indicating that some required arguments are missing.
MissingArgumentValueException Specifies that a value was missing when attempting to populate an argument.
TooManyValuesForArgumentException An exception indicating that too many values have been provided for the given argument.
UnknownEnumeratedValueException An exception for when an argument doesn't match an of the enumerated options for that var type
UnmatchedArgumentException An exception for values that can't be mated with any argument.
UserException Represents the common user errors detected by Sting / GATK Root class for all GATK user errors, as well as the container for errors themselves User: depristo Date: Sep 3, 2010 Time: 2:24:09 PM

Annotations available to VariantAnnotator and the variant callers (some restrictions apply)

Name Summary
AlleleBalance The allele balance (fraction of ref bases over ref + alt bases) across all biallelic het-called samples
AlleleBalanceBySample Allele balance per sample
BaseCounts Count of A, C, G, T bases across all samples
BaseQualityRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities
ChromosomeCounts Allele counts and frequency for each ALT allele and total number of alleles in called genotypes
ClippingRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for reads with clipped bases
Coverage Total (unfiltered) depth over all samples.
DepthPerAlleleBySample The depth of coverage of each allele per sample
FisherStrand Phred-scaled p-value using Fisher's Exact Test to detect strand bias
GCContent GC content of the reference around the given site
HaplotypeScore Consistency of the site with two (and only two) segregating haplotypes.
HardyWeinberg Hardy-Weinberg test for disequilibrium
HomopolymerRun Largest contiguous homopolymer run of the variant allele
InbreedingCoeff Likelihood-based (using PL field) test for the inbreeding among samples.
LowMQ Triplet annotation: fraction of MAQP == 0, MAPQ < 10, and count of all mapped reads
MVLikelihoodRatio Likelihood of being a Mendelian Violation
MappingQualityRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities
MappingQualityZero Total count across all samples of mapping quality zero reads
MappingQualityZeroBySample Count for each sample of mapping quality zero reads
NBaseCount The number of N bases, counting only SOLiD data
QualByDepth Variant confidence (from the QUAL field) / unfiltered depth of non-reference samples.
RMSMappingQuality Root Mean Square of the mapping quality of the reads across all samples.
ReadPosRankSumTest U-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele
SampleList List all of the polymorphic samples.
SnpEff A set of genomic annotations based on the output of the SnpEff variant effect predictor tool
SpanningDeletions Fraction of reads containing spanning deletions at this site
TandemRepeatAnnotator Annotates variants that are composed of tandem repeats
TransmissionDisequilibriumTest Wittkowski transmission disequilibrium test
VariantType Assigns a roughly correct category of the variant type (SNP, MNP, insertion, deletion, etc.)

See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.