Phone home
Contents |
Introduction and Motivation
Since September, 2010, the GATK has had a phone-home feature that sends us details about each GATK run via the Broad filesystem (within the Broad) and Amazon's S3 cloud storage service (outside the Broad). This feature is enabled by default.
The information provided by the phone-home feature is critical in driving improvements to the GATK:
- By recording detailed information about each error that occurs, it enables GATK developers to identify and fix previously-unknown bugs in the GATK. We are constantly monitoring the errors our users encounter and do our best to fix those errors that are caused by bugs in our code.
- It allows us to better understand how the GATK is used in practice and adjust our documentation and development goals for common use cases.
- It gives us a picture of which versions of the GATK are in use over time, and how successful we've been at encouraging users to migrate from obsolete or broken versions of the GATK to newer, improved versions.
- It tells us which tools are most commonly used, allowing us to monitor the adoption of newly-released tools and abandonment of outdated tools.
- It provides us with a sense of the overall size of our user base and the major organizations/institutions using the GATK.
Example GATK Run Report
Below are two example GATK Run Reports showing exactly what information is sent to us each time the GATK phones home.
A successful run:
<GATK-run-report> <id>D7D31ULwTSxlAwnEOSmW6Z4PawXwMxEz</id> <start-time>2012/03/10 20.21.19</start-time> <end-time>2012/03/10 20.21.19</end-time> <run-time>0</run-time> <walker-name>CountReads</walker-name> <svn-version>1.4-483-g63ecdb2</svn-version> <total-memory>85000192</total-memory> <max-memory>129957888</max-memory> <user-name>depristo</user-name> <host-name>10.0.1.10</host-name> <java>Apple Inc.-1.6.0_26</java> <machine>Mac OS X-x86_64</machine> <iterations>105</iterations> </GATK-run-report>
A run where an exception has occurred:
<GATK-run-report>
<id>yX3AnltsqIlXH9kAQqTWHQUd8CQ5bikz</id>
<exception>
<message>Failed to parse Genome Location string: 20:10,000,000-10,000,001x</message>
<stacktrace class="java.util.ArrayList">
<string>org.broadinstitute.sting.utils.GenomeLocParser.parseGenomeLoc(GenomeLocParser.java:377)</string>
<string>org.broadinstitute.sting.utils.interval.IntervalUtils.parseIntervalArguments(IntervalUtils.java:82)</string>
<string>org.broadinstitute.sting.commandline.IntervalBinding.getIntervals(IntervalBinding.java:106)</string>
<string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.loadIntervals(GenomeAnalysisEngine.java:618)</string>
<string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeIntervals(GenomeAnalysisEngine.java:585)</string>
<string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:231)</string>
<string>org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)</string>
<string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)</string>
<string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)</string>
<string>org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)</string>
</stacktrace>
<cause>
<message>Position: '10,000,001x' contains invalid chars.</message>
<stacktrace class="java.util.ArrayList">
<string>org.broadinstitute.sting.utils.GenomeLocParser.parsePosition(GenomeLocParser.java:411)</string>
<string>org.broadinstitute.sting.utils.GenomeLocParser.parseGenomeLoc(GenomeLocParser.java:374)</string>
<string>org.broadinstitute.sting.utils.interval.IntervalUtils.parseIntervalArguments(IntervalUtils.java:82)</string>
<string>org.broadinstitute.sting.commandline.IntervalBinding.getIntervals(IntervalBinding.java:106)</string>
<string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.loadIntervals(GenomeAnalysisEngine.java:618)</string>
<string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeIntervals(GenomeAnalysisEngine.java:585)</string>
<string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:231)</string>
<string>org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)</string>
<string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)</string>
<string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)</string>
<string>org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)</string>
</stacktrace>
<is-user-exception>false</is-user-exception>
</cause>
<is-user-exception>true</is-user-exception>
</exception>
<start-time>2012/03/10 20.19.52</start-time>
<end-time>2012/03/10 20.19.52</end-time>
<run-time>0</run-time>
<walker-name>CountReads</walker-name>
<svn-version>1.4-483-g63ecdb2</svn-version>
<total-memory>85000192</total-memory>
<max-memory>129957888</max-memory>
<user-name>depristo</user-name>
<host-name>10.0.1.10</host-name>
<java>Apple Inc.-1.6.0_26</java>
<machine>Mac OS X-x86_64</machine>
<iterations>0</iterations>
</GATK-run-report>
Note that as of GATK 1.5 we no longer collect information about the command-line executed, the working directory, or tmp directory.
Disabling Phone Home
The GATK is currently in the process of evolving to require interaction with Amazon S3 as a normal part of each run. For this reason, and because the information contained in the GATK run reports is so critical in driving improvements to the GATK, we strongly discourage our users from disabling the phone-home feature.
At the same time, we recognize that some of our users do have legitimate reasons for needing to run the GATK with phone-home disabled, and we don't wish to make it impossible for these users to run the GATK. Examples of legitimate reasons include:
- Technical reasons: your local network might have restrictions in place that don't allow the GATK to access external resources, or you might need to run the GATK in a network-less environment.
- Organizational reasons: your organization's policies might forbid the dissemination of one or more pieces of information contained in the GATK run report.
For such users we have provided an "-et NO_ET" option in the GATK to disable the phone-home feature. To use this option in GATK 1.5 and later, you need to contact us to request a key. Instructions for doing so are below.
How to Obtain a GATK Key
To obtain a GATK key, email gsahelp@broadinstitute.org with the subject line GATK Key Request. In your email, please provide us with the reason(s) you need to run the GATK with the phone-home feature disabled. Please be as explicit as possible about why you cannot participate in the phone home feature. Note that the key you will be issued is linked to the email address you used to contact us.
Running the GATK With a Key
Running the GATK with a key is simple -- you just need to append a "-K your.key" argument to your customary command line, where "your.key" is the path to the key file you obtained from us:
java -jar dist/GenomeAnalysisTK.jar \ -T PrintReads \ -I public/testdata/exampleBAM.bam \ -R public/testdata/exampleFASTA.fasta \ -et NO_ET \ -K your.key
The -K argument is only necessary when running the GATK with the NO_ET option.
Troubleshooting GATK Keys
Corrupt/Unreadable/Revoked Keys
If you get an error message from the GATK saying that your key is corrupt, unreadable, or has been revoked, please email gsahelp@broadinstitute.org to ask for a replacement key.
GATK Public Key Not Found
If you get an error message stating that the GATK public key could not be located or read, then something is likely wrong with your build of the GATK. If you're running the binary release, try re-downloading the latest version from our FTP server. If you're compiling from source, try doing an "ant clean" and re-compiling. If all else fails, please ask for help on our getsatisfaction forum.
Working with the GATK Report DB
All of the Broad runs are archived in /humgen/gsa-hpprojects/GATK/reports/archive/, and just submitted (not yet archived) individual reports are in /humgen/gsa-hpprojects/GATK/reports/submitted for each <jobid>.xml.gz. You can use the analyzeRunReports.py script to access these reports, such as:
python python/analyzeRunReports.py xml /humgen/gsa-hpprojects/GATK/reports/archive/*.gz | less
Will format as XML all archive gz files and pipe the output to less. You can use less (or grep, or any other tool) to identify individual reports by id, etc. If you want to work with the report in an R friendly format, you can use table instead of xml for the analyzeRunReports.py script.