Tagged with #gatkdocs
2 documentation articles | 0 announcements | 6 forum discussions

Created 2012-08-15 17:00:11 | Updated 2013-03-25 18:25:50 | Tags: gatkdocs developer walkers intermediate

Comments (0)

The GATK discovers walker documentation by reading it out of the Javadoc, Sun's design pattern for providing documentation for packages and classes. This page will provide an extremely brief explanation of how to write Javadoc; more information on writing javadoc comments can be found in Sun's documentation.

1. Adding walker and package descriptions to the help text

The GATK's build system uses the javadoc parser to extract the javadoc for classes and packages and embed the contents of that javadoc in the help system. If you add Javadoc to your package or walker, it will automatically appear in the help. The javadoc parser will pick up on 'standard' javadoc comments, such as the following, taken from PrintReadsWalker:

 * This walker prints out the input reads in SAM format.  Alternatively, the walker can write reads into a specified BAM file.

You can add javadoc to your package by creating a special file, package-info.java, in the package directory. This file should consist of the javadoc for your package plus a package descriptor line. One such example follows:

 * @help.display.name Miscellaneous walkers (experimental)
package org.broadinstitute.sting.playground.gatk.walkers;

Additionally, the GATK provides two extra custom tags for overriding the information that ultimately makes it into the help.

  • @help.display.name Changes the name of the package as it appears in help. Note that the name of the walker cannot be changed as it is required to be passed verbatim to the -T argument.

  • @help.summary Changes the description which appears on the right-hand column of the help text. This is useful if you'd like to provide a more concise description of the walker that should appear in the help.

  • @help.description Changes the description which appears at the bottom of the help text with -T <your walker> --help is specified. This is useful if you'd like to present a more complete description of your walker.

2. Hiding experimental walkers (use sparingly, please!)

Walkers can be hidden from the documentation system by adding the @Hidden annotation to the top of each walker. @Hidden walkers can still be run from the command-line, but their documentation will not be visible to end users. Please use this functionality sparingly to avoid walkers with hidden command-line options that are required for production use.

3. Disabling building of help

Because the building of our help text is actually heavyweight and can dramatically increase compile time on some systems, we have a mechanism to disable help generation.

Compile with the following command:

ant -Ddisable.help=true

to disable generation of help.

Created 2012-08-11 05:31:52 | Updated 2012-10-18 15:35:49 | Tags: gatkdocs developer walkers intermediate

Comments (0)

The GATKDocs are what we call "Technical Documentation" in the Guide section of this website. The HTML pages are generated automatically at build time from specific blocks of documentation in the source code.

The best place to look for example documentation for a GATK walker is GATKDocsExample walker in org.broadinstitute.sting.gatk.examples. This is available here.

Below is the reproduction of that file from August 11, 2011:

 * [Short one sentence description of this walker]
 * <p>
 * [Functionality of this walker]
 * </p>
 * <h2>Input</h2>
 * <p>
 * [Input description]
 * </p>
 * <h2>Output</h2>
 * <p>
 * [Output description]
 * </p>
 * <h2>Examples</h2>
 *    java
 *      -jar GenomeAnalysisTK.jar
 *      -T $WalkerName
 * @category Walker Category
 * @author Your Name
 * @since Date created
public class GATKDocsExample extends RodWalker<Integer, Integer> {
     * Put detailed documentation about the argument here.  No need to duplicate the summary information
     * in doc annotation field, as that will be added before this text in the documentation page.
     * Notes:
     * <ul>
     *     <li>This field can contain HTML as a normal javadoc</li>
     *     <li>Don't include information about the default value, as gatkdocs adds this automatically</li>
     *     <li>Try your best to describe in detail the behavior of the argument, as ultimately confusing
     *          docs here will just result in user posts on the forum</li>
     * </ul>
    @Argument(fullName="full", shortName="short", doc="Brief summary of argument [~ 80 characters of text]", required=false)
    private boolean myWalkerArgument = false;

    public Integer map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) { return 0; }
    public Integer reduceInit() { return 0; }
    public Integer reduce(Integer value, Integer sum) { return value + sum; }
    public void onTraversalDone(Integer result) { }
No articles to display.

Created 2016-02-27 23:55:23 | Updated | Tags: haplotypecaller gatkdocs variantfiltration vcf developer gatk variant-calling

Comments (2)

In a discussion about using ERC, you provide some example VCF output lines like:

20  10000442   .   T  <NON_REF>  .   .   .   GT:AD:CD:DP:GQ:PL  0/0:56,0:56:56:99:0,168,2095
20  10000443   .   A  <NON_REF>  .   .   .   GT:AD:CD:DP:GQ:PL  0/0:56,0:56:56:99:0,169,2089
20  10000444   .   A  <NON_REF>  .   .   .   GT:AD:CD:DP:GQ:PL  0/0:56,0:56:56:99:0,168,2093

and say:

  1. "For each reference base not part of a variant call we emit a VCF record with the special symbolic allele <NON_REF> indicating the call is between the reference base and any possible non-reference allele that might be segregating at this site,"
  2. and
  3. "Note that there's no site-level QUAL field value. We discussed this internally and since the QUAL is the probability that the site is polymorphic, all of the QUAL field values should be 0 here, so we decided to drop it."

While the formal Variant Call Format 4.2 Specification says:

  1. "ALT - alternate base(s): Comma separated list of alternate non-reference alleles called on at least one of the samples. Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or an angle-bracketed ID String (“<ID>”) or a breakend replacement string as described in the section on breakends,"
  2. and
  3. "QUAL - quality: Phred-scaled quality score for the assertion made in ALT. i.e. −10log10 prob(call in ALT is wrong). If ALT is ‘.’ (no variant) then this is −10log10 prob(variant), and if ALT is not ‘.’ this is −10log10 prob(no variant). If unknown, the missing value should be specified."

The issue is subtle, but introduces problems with downstream processing of HaplotypeCaller generated VCF files containing reference calls. The use of the symbol "<NON_REF>" instead of "." for reference calls is a little confusing, but I also see the logic of that. More seriously: According to the VCFv4.2 specs, QUAL is NOT always a measure of "the probability that the site is polymorphic". Perhaps when a variant is called, but not when a site is called as non-variant. All those QUAL field values should NOT be 0 there. It is about the quality and correctness of the call itself. It is defined as the PHRED score of the probability that the assertion made in ALT is wrong. So if ALT asserts "<NON_REF>" or "." CONFIDENTLY (meaning a low probability that the assertion is wrong), then the QUAL PHRED score should be HIGH, not ZERO. A confident call should have a high QUAL score, whether variant or monomorphic. That is the intention of the specification. If otherwise: filtering out low quality records from a non-compliant VCF output file by removing those with low QUAL scores, for instance, would also filter out all the high quality reference calls.

I have previously been in touch with the writers/keepers of the VCF Specs (now over at samtools) and they confirm this interpretation of QUAL scores as the correct one for VCFv4.2 (and other versions) compliant output. It's unfortunately couched in mathematical double negatives, but look at it carefully and you will see the correctness of this reading. As one who uses tools from many sources, I hope you will address this discrepancy. It complicates interoperability.

-- Fred P.

Created 2016-01-23 05:08:59 | Updated | Tags: gatkdocs gatk error

Comments (2)


I tried to look up the documentation for certain tools, such as CalculateGenotypePosteriors, but when I try to access it, I just get a blank page. This is the link I am trying to access: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CalculateGenotypePosteriors.php If this is not the correct link, please direct me to the right one. I noticed this was also happening to some other tools.

Thanks, Alva

Created 2016-01-18 15:25:20 | Updated | Tags: gatkdocs

Comments (1)


None of the GATK documentation pages for the GATK tools seem to be loading. Is there something going on with the broad institute website? Is this the case with anyone else? If not please let me know so I can check things on my end.

Thank you!

Created 2013-06-11 15:36:13 | Updated | Tags: gatkdocs

Comments (4)

There are at least two images in the Guide Book that stop Adobe and give an error message. One is on page 94 the other is one page 130 Adobe says "Try..." and then gives an URL with a 32 character random alphanumeric string. LOL I am human, cannot cut and paste it (it vanishes when I try), and cannot possible type 32 random characters with our error. Could you post good URLs to these two images. Thanks

Created 2013-06-11 15:29:21 | Updated | Tags: gatkdocs

Comments (4)

Having the "Table of Contents" in the back of the GATK Guide Book, caused me considerable time and trouble. I spent hours scanning through pages, looking for topics, for several days, before I found the Table of Contents in the back. I would be surprised if I am the only one this has happened to. Could I respectfully request that you move it to the front?

Created 2012-11-06 16:13:18 | Updated | Tags: selectvariants gatkdocs genotypeconcordance

Comments (4)

I'm looking to find all the entries that change between two calls to UG on the same data. I would like to find all the entries where the call in the variant tract are different from those in the comparison track. So in effect I want those entries that would not be result from -using -conc in SelectVariants. From the documentation is is unclear if the -disc option does this:

A site is considered discordant if there exists some sample in the variant track that has a non-reference genotype and either the site isn't present in this track, the sample isn't present in this track, or the sample is called reference in this track.

What if the comp is HOM_VAR and the variant track is HET? Or if they are both HET but disagree on the specific allele?