Tagged with #phone-home
1 documentation article | 0 announcements | 2 forum discussions


Comments (0)

1. What it is and how it helps us improve the GATK

Since September, 2010, the GATK has had a "phone-home" feature that sends us information about each GATK run via the Broad filesystem (within the Broad) and Amazon's S3 cloud storage service (outside the Broad). This feature is enabled by default.

The information provided by the phone-home feature is critical in driving improvements to the GATK

  • By recording detailed information about each error that occurs, it enables GATK developers to identify and fix previously-unknown bugs in the GATK. We are constantly monitoring the errors our users encounter and do our best to fix those errors that are caused by bugs in our code.
  • It allows us to better understand how the GATK is used in practice and adjust our documentation and development goals for common use cases.
  • It gives us a picture of which versions of the GATK are in use over time, and how successful we've been at encouraging users to migrate from obsolete or broken versions of the GATK to newer, improved versions.
  • It tells us which tools are most commonly used, allowing us to monitor the adoption of newly-released tools and abandonment of outdated tools.
  • It provides us with a sense of the overall size of our user base and the major organizations/institutions using the GATK.

2. What information is sent to us

Below are two example GATK Run Reports showing exactly what information is sent to us each time the GATK phones home.

A successful run:

<GATK-run-report>
    <id>D7D31ULwTSxlAwnEOSmW6Z4PawXwMxEz</id>
    <start-time>2012/03/10 20.21.19</start-time>
    <end-time>2012/03/10 20.21.19</end-time>
    <run-time>0</run-time>
    <walker-name>CountReads</walker-name>
    <svn-version>1.4-483-g63ecdb2</svn-version>
    <total-memory>85000192</total-memory>
    <max-memory>129957888</max-memory>
    <user-name>depristo</user-name>
    <host-name>10.0.1.10</host-name>
    <java>Apple Inc.-1.6.0_26</java>
    <machine>Mac OS X-x86_64</machine>
    <iterations>105</iterations>
</GATK-run-report>

A run where an exception has occurred:

<GATK-run-report>
   <id>yX3AnltsqIlXH9kAQqTWHQUd8CQ5bikz</id>   
   <exception>
      <message>Failed to parse Genome Location string: 20:10,000,000-10,000,001x</message>
      <stacktrace class="java.util.ArrayList"> 
         <string>org.broadinstitute.sting.utils.GenomeLocParser.parseGenomeLoc(GenomeLocParser.java:377)</string>
         <string>org.broadinstitute.sting.utils.interval.IntervalUtils.parseIntervalArguments(IntervalUtils.java:82)</string>
         <string>org.broadinstitute.sting.commandline.IntervalBinding.getIntervals(IntervalBinding.java:106)</string>
         <string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.loadIntervals(GenomeAnalysisEngine.java:618)</string>
         <string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeIntervals(GenomeAnalysisEngine.java:585)</string>
         <string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:231)</string>
         <string>org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)</string>
         <string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)</string>
         <string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)</string>
         <string>org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)</string>
      </stacktrace>
      <cause>
         <message>Position: &apos;10,000,001x&apos; contains invalid chars.</message>
         <stacktrace class="java.util.ArrayList">
            <string>org.broadinstitute.sting.utils.GenomeLocParser.parsePosition(GenomeLocParser.java:411)</string>
            <string>org.broadinstitute.sting.utils.GenomeLocParser.parseGenomeLoc(GenomeLocParser.java:374)</string>
            <string>org.broadinstitute.sting.utils.interval.IntervalUtils.parseIntervalArguments(IntervalUtils.java:82)</string>
            <string>org.broadinstitute.sting.commandline.IntervalBinding.getIntervals(IntervalBinding.java:106)</string>
            <string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.loadIntervals(GenomeAnalysisEngine.java:618)</string>
            <string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeIntervals(GenomeAnalysisEngine.java:585)</string>
            <string>org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:231)</string>
            <string>org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)</string>
            <string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)</string>
            <string>org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)</string>
            <string>org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)</string>
         </stacktrace>
         <is-user-exception>false</is-user-exception>
      </cause>
      <is-user-exception>true</is-user-exception>
   </exception>
   <start-time>2012/03/10 20.19.52</start-time>
   <end-time>2012/03/10 20.19.52</end-time>
   <run-time>0</run-time>
   <walker-name>CountReads</walker-name>
   <svn-version>1.4-483-g63ecdb2</svn-version>
   <total-memory>85000192</total-memory>
   <max-memory>129957888</max-memory>
   <user-name>depristo</user-name>
   <host-name>10.0.1.10</host-name>
   <java>Apple Inc.-1.6.0_26</java>
   <machine>Mac OS X-x86_64</machine>
   <iterations>0</iterations>
</GATK-run-report>

Note that as of GATK 1.5 we no longer collect information about the command-line executed, the working directory, or tmp directory.

3. Disabling Phone Home

The GATK is currently in the process of evolving to require interaction with Amazon S3 as a normal part of each run. For this reason, and because the information contained in the GATK run reports is so critical in driving improvements to the GATK, we strongly discourage our users from disabling the phone-home feature.

At the same time, we recognize that some of our users do have legitimate reasons for needing to run the GATK with phone-home disabled, and we don't wish to make it impossible for these users to run the GATK.

Examples of legitimate reasons for disabling Phone Home

  • Technical reasons: Your local network might have restrictions in place that don't allow the GATK to access external resources, or you might need to run the GATK in a network-less environment.

  • Organizational reasons: Your organization's policies might forbid the dissemination of one or more pieces of information contained in the GATK run report.

For such users we have provided an -et NO_ET option in the GATK to disable the phone-home feature. To use this option in GATK 1.5 and later, you need to contact us to request a key. Instructions for doing so are below.

How to obtain and use a GATK key

To obtain a GATK key, please fill out the request form.

Running the GATK with a key is simple: you just need to append a -K your.key argument to your customary command line, where your.key is the path to the key file you obtained from us:

java -jar dist/GenomeAnalysisTK.jar \
    -T PrintReads \
    -I public/testdata/exampleBAM.bam \
    -R public/testdata/exampleFASTA.fasta \
    -et NO_ET \
    -K your.key

The -K argument is only necessary when running the GATK with the NO_ET option.

Troubleshooting key-related problems

  • Corrupt/Unreadable/Revoked Keys

If you get an error message from the GATK saying that your key is corrupt, unreadable, or has been revoked, please email '''gsahelp@broadinstitute.org''' to ask for a replacement key.

  • GATK Public Key Not Found

If you get an error message stating that the GATK public key could not be located or read, then something is likely wrong with your build of the GATK. If you're running the binary release, try downloading it again. If you're compiling from source, try doing an ant clean and re-compiling. If all else fails, please ask for help on our community forum.

What does GSA use Phone Home data for?

We use the phone home data for three main purposes. First, we monitor the input logs for errors that occur in the GATK, and proactively fix them in the codebase. Second, we monitor the usage rates of the GATK in general and specific versions of the GATK to explain how widely used the GATK is to funding agencies and other potential supporters. Finally, we monitor adoption rates of specific GATK tools to understand how quickly new tools reach our users. Many of these analyses require us to aggregate the data by unique user, which is why we still collect the username of the individual who ran the GATK (as you can see in the plots). Examples of all three uses are shown in the Tableau graphs below, which update each night and are sent to the GATK members each morning for review.

No posts found with the requested search criteria.
Comments (1)

HI When I run Base recabrator with the following command:

java -Xmx4g -jar /usr/bin/GenomeAnalysisTK.jar -T BaseRecalibrator -I realignedBam.bam  -R /data1/human_g1k_v37.fasta --knownSites /data1/snp132.vcf -o recalibration_report.grp

I get the following error :

INFO  07:15:53,380 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,380 HttpMethodDirector - Retrying request 
INFO  07:15:53,386 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,387 HttpMethodDirector - Retrying request 
INFO  07:15:53,393 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,393 HttpMethodDirector - Retrying request 
INFO  07:15:53,398 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,398 HttpMethodDirector - Retrying request 
INFO  07:15:53,405 HttpMethodDirector - I/O exception (javax.net.ssl.SSLException) caught when processing request: Unrecognized SSL message, plaintext connection? 
INFO  07:15:53,405 HttpMethodDirector - Retrying request 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.0-34-g07bda93): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR          Name        FeatureType   Documentation
##### ERROR          BCF2     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_bcf2_BCF2Codec.html
##### ERROR        BEAGLE      BeagleFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_beagle_BeagleCodec.html
##### ERROR           BED         BEDFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_bed_BEDCodec.html
##### ERROR      BEDTABLE       TableFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_table_BedTableCodec.html
##### ERROR EXAMPLEBINARY            Feature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_example_ExampleBinaryCodec.html
##### ERROR      GELITEXT    GeliTextFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_gelitext_GeliTextCodec.html
##### ERROR      OLDDBSNP    OldDbSNPFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broad_tribble_dbsnp_OldDbSNPCodec.html
##### ERROR     RAWHAPMAP   RawHapMapFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_hapmap_RawHapMapCodec.html
##### ERROR        REFSEQ      RefSeqFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_refseq_RefSeqCodec.html
##### ERROR     SAMPILEUP   SAMPileupFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
##### ERROR       SAMREAD     SAMReadFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_samread_SAMReadCodec.html
##### ERROR         TABLE       TableFeature   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_table_TableCodec.html
##### ERROR           VCF     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_vcf_VCFCodec.html
##### ERROR          VCF3     VariantContext   http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_vcf_VCF3Codec.html
##### ERROR ------------------------------------------------------------------------------------------
Comments (9)

Dear GATK team,

The command I used:

 java -Xmx4g -jar /usr/local/src/gatk/GenomeAnalysisTK-2.0-0-g4c0ffd4/GenomeAnalysisTK.jar   -T BaseRecalibrator   -I my_merged_lane1.bam -R reference_genome.fasta  -knownSites my_snps.bed  -o recal_data_lane1.grp

It seemed it worked, but I got the following :

WARN  23:15:29,538 RestStorageService - Error Response: PUT '/GATK_Run_Reports/7wP8pzXrHgtLBjkDcJtGqpqLljupw7aN.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 343, Content-MD5: gmCP98zrgZqoNpLiyKC2+w==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 82608ff7cceb819aa83692e2c8a0b6fb, Date: Thu, 25 Oct 2012 21:15:28 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:rX//6IdcVn7cmu+vh3BM1OdRuG0=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-71.29.1.el6.x86_64; amd64; en; JVM 1.6.0_17), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 44927B6871494046, x-amz-id-2: 6PtxgxrhMcCLXlVmKviqkFWT+jHmrg/hOvEqtJ1Z160m9O7aoxTYVnNq/OSGMkg9, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 25 Oct 2012 20:43:55 GMT, Connection: close, Server: AmazonS3] 
WARN  23:15:30,558 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately -1894 seconds. Retrying connection. 
INFO  23:15:31,373 GATKRunReport - Uploaded run statistics report to AWS S3 

And for the -T PrintReads I got:

WARN  01:32:59,987 RestStorageService - Error Response: PUT '/GATK_Run_Reports/rDrSv8aayCQwBzzwAC3R4P1NNhU8eNtF.report.xml.gz' -- 
ResponseCode: 403, ResponseStatus: Forbidden, Request Headers:
[Content-Length: 347, Content-MD5: /HRSKSCV6FIXe/03JWlMwQ==, Content-Type:
application/octet-stream, x-amz-meta-md5-hash:
fc7452292095e852177bfd3725694cc1, Date: Thu, 25 Oct 2012 23:32:58 GMT,
Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:ye/pDLSwNPHgH2kNUHKWCAt3YQ4=,
User-Agent: JetS3t/0.8.1 (Linux/2.6.32-71.29.1.el6.x86_64; amd64; en; JVM
1.6.0_17), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers:
[x-amz-request-id: 11DE8B74B2E028F3, x-amz-id-2:
/mV+Znu5Rq1j8tub42Y4CNz2lD5npQrtgFDM2OL5Tap3Whtt4rL4KOJLFtqhgNbA,
Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 25 Oct
2012 23:01:26 GMT, Connection: close, Server: AmazonS3] 
WARN  01:33:00,975 RestStorageService - Adjusted time offset in response to
RequestTimeTooSkewed error. Local machine and S3 server disagree on the time
by approximately -1893 seconds. Retrying connection. 

I will be grateful if you could let me know if I'm doing something wrong, and if should be worried about this.

Thanks a lot.