Tagged with #test
3 documentation articles | 0 announcements | 0 forum discussions


Comments (0)

1. Testing core walkers is critical

Most GATK walkers are really too complex to easily test using the standard unit test framework. It's just not feasible to make artificial read piles and then extrapolate from simple tests passing whether the system as a whole is working correctly. However, we need some way to determine whether changes to the core of the GATK are altering the expected output of complex walkers like BaseRecalibrator or SingleSampleGenotyper. In additional to correctness, we want to make sure that the performance of key walkers isn't degrading over time, so that calling snps, cleaning indels, etc., isn't slowly creeping down over time. Since we are now using a bamboo server to automatically build and run unit tests (as well as measure their runtimes) we want to put as many good walker tests into the test framework so we capture performance metrics over time.

2. The WalkerTest framework

To make this testing process easier, we've created a WalkerTest framework that lets you invoke the GATK using command-line GATK commands in the JUnit system and test for changes in your output files by comparing the current ant build results to previous run via an MD5 sum. It's a bit coarse grain, but it will work to ensure that changes to key walkers are detected quickly by the system, and authors can either update the expected MD5s or go track down bugs.

The system is fairly straightforward to use. Ultimately we will end up with JUnit style tests in the unit testing structure. In the piece of code below, we have a piece of code that checks the MD5 of the SingleSampleGenotyper's GELI text output at LOD 3 and LOD 10.

package org.broadinstitute.sting.gatk.walkers.genotyper;

import org.broadinstitute.sting.WalkerTest;
import org.junit.Test;

import java.util.HashMap;
import java.util.Map;
import java.util.Arrays;

public class SingleSampleGenotyperTest extends WalkerTest {
    @Test
    public void testLOD() {
        HashMap<Double, String> e = new HashMap<Double, String>();
        e.put( 10.0, "e4c51dca6f1fa999f4399b7412829534" );
        e.put( 3.0, "d804c24d49669235e3660e92e664ba1a" );

        for ( Map.Entry<Double, String> entry : e.entrySet() ) {
            WalkerTest.WalkerTestSpec spec = new WalkerTest.WalkerTestSpec(
                   "-T SingleSampleGenotyper -R /broad/1KG/reference/human_b36_both.fasta -I /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam -varout %s --variant_output_format GELI -L 1:10,000,000-11,000,000 -m EMPIRICAL -lod " + entry.getKey(), 1,
                    Arrays.asList(entry.getValue()));
            executeTest("testLOD", spec);
        }
    }
}

The fundamental piece here is to inherit from WalkerTest. This gives you access to the executeTest() function that consumes a WalkerTestSpec:

    public WalkerTestSpec(String args, int nOutputFiles, List<String> md5s)

The WalkerTestSpec takes regular, command-line style GATK arguments describing what you want to run, the number of output files the walker will generate, and your expected MD5s for each of these output files. The args string can contain %s String.format specifications, and for each of the nOutputFiles, the executeTest() function will (1) generate a tmp file for output and (2) call String.format on your args to fill in the tmp output files in your arguments string. For example, in the above argument string varout is followed by %s, so our single SingleSampleGenotyper output is the variant output file.

3. Example output

When you add a walkerTest inherited unit test to the GATK, and then build test, you'll see output that looks like:

[junit] WARN  13:29:50,068 WalkerTest - -------------------------------------------------------------------------------- 
[junit] WARN  13:29:50,068 WalkerTest - -------------------------------------------------------------------------------- 
[junit] WARN  13:29:50,069 WalkerTest - Executing test testLOD with GATK arguments: -T SingleSampleGenotyper -R /broad/1KG/reference/human_b36_both.fasta -I /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam -varout /tmp/walktest.tmp_param.05524470250256847817.tmp --variant_output_format GELI -L 1:10,000,000-11,000,000 -m EMPIRICAL -lod 3.0
[junit]  
[junit] WARN  13:29:50,069 WalkerTest - Executing test testLOD with GATK arguments: -T SingleSampleGenotyper -R /broad/1KG/reference/human_b36_both.fasta -I /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam -varout /tmp/walktest.tmp_param.05524470250256847817.tmp --variant_output_format GELI -L 1:10,000,000-11,000,000 -m EMPIRICAL -lod 3.0
[junit]  
[junit] WARN  13:30:39,407 WalkerTest - Checking MD5 for /tmp/walktest.tmp_param.05524470250256847817.tmp [calculated=d804c24d49669235e3660e92e664ba1a, expected=d804c24d49669235e3660e92e664ba1a] 
[junit] WARN  13:30:39,407 WalkerTest - Checking MD5 for /tmp/walktest.tmp_param.05524470250256847817.tmp [calculated=d804c24d49669235e3660e92e664ba1a, expected=d804c24d49669235e3660e92e664ba1a] 
[junit] WARN  13:30:39,408 WalkerTest -   => testLOD PASSED 
[junit] WARN  13:30:39,408 WalkerTest -   => testLOD PASSED 
[junit] WARN  13:30:39,409 WalkerTest - -------------------------------------------------------------------------------- 
[junit] WARN  13:30:39,409 WalkerTest - -------------------------------------------------------------------------------- 
[junit] WARN  13:30:39,409 WalkerTest - Executing test testLOD with GATK arguments: -T SingleSampleGenotyper -R /broad/1KG/reference/human_b36_both.fasta -I /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam -varout /tmp/walktest.tmp_param.03852477489430798188.tmp --variant_output_format GELI -L 1:10,000,000-11,000,000 -m EMPIRICAL -lod 10.0
[junit]  
[junit] WARN  13:30:39,409 WalkerTest - Executing test testLOD with GATK arguments: -T SingleSampleGenotyper -R /broad/1KG/reference/human_b36_both.fasta -I /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam -varout /tmp/walktest.tmp_param.03852477489430798188.tmp --variant_output_format GELI -L 1:10,000,000-11,000,000 -m EMPIRICAL -lod 10.0
[junit]  
[junit] WARN  13:31:30,213 WalkerTest - Checking MD5 for /tmp/walktest.tmp_param.03852477489430798188.tmp [calculated=e4c51dca6f1fa999f4399b7412829534, expected=e4c51dca6f1fa999f4399b7412829534] 
[junit] WARN  13:31:30,213 WalkerTest - Checking MD5 for /tmp/walktest.tmp_param.03852477489430798188.tmp [calculated=e4c51dca6f1fa999f4399b7412829534, expected=e4c51dca6f1fa999f4399b7412829534] 
[junit] WARN  13:31:30,213 WalkerTest -   => testLOD PASSED 
[junit] WARN  13:31:30,213 WalkerTest -   => testLOD PASSED 
[junit] WARN  13:31:30,214 SingleSampleGenotyperTest -  
[junit] WARN  13:31:30,214 SingleSampleGenotyperTest -  

4. Recommended location for GATK testing data

We keep all of the permenant GATK testing data in:

/humgen/gsa-scr1/GATK_Data/Validation_Data/

A good set of data to use for walker testing is the CEU daughter data from 1000 Genomes:

gsa2 ~/dev/GenomeAnalysisTK/trunk > ls -ltr /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_1*.bam /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_1*.calls
-rw-rw-r--+ 1 depristo wga  51M 2009-09-03 07:56 /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam
-rw-rw-r--+ 1 depristo wga 185K 2009-09-04 13:21 /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.lod5.variants.geli.calls
-rw-rw-r--+ 1 depristo wga 164M 2009-09-04 13:22 /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.lod5.genotypes.geli.calls
-rw-rw-r--+ 1 depristo wga  24M 2009-09-04 15:00 /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SOLID.bam
-rw-rw-r--+ 1 depristo wga  12M 2009-09-04 15:01 /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.454.bam
-rw-r--r--+ 1 depristo wga  91M 2009-09-04 15:02 /humgen/gsa-scr1/GATK_Data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.allTechs.bam

5. Test dependencies

The tests depend on a variety of input files, that are generally constrained to three mount points on the internal Broad network:

*/seq/
*/humgen/1kg/
*/humgen/gsa-hpprojects/GATK/Data/Validation_Data/

To run the unit and integration tests you'll have to have access to these files. They may have different mount points on your machine (say, if you're running remotely over the VPN and have mounted the directories on your own machine).

6. MD5 database and comparing MD5 results

Every file that generates an MD5 sum as part of the WalkerTest framework will be copied to <MD5>. integrationtest in the integrationtests subdirectory of the GATK trunk. This MD5 database of results enables you to easily examine the results of an integration test as well as compare the results of a test before/after a code change. For example, below is an example test for the UnifiedGenotyper that, due to a code change, where the output VCF differs from the VCF with the expected MD5 value in the test code itself. The test provides provides the path to the two results files as well as a diff command to compare expected to the observed MD5:

[junit] --------------------------------------------------------------------------------    
[junit] Executing test testParameter[-genotype] with GATK arguments: -T UnifiedGenotyper -R /broad/1KG/reference/human_b36_both.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.SLX.bam -varout /tmp/walktest.tmp_param.05997727998894311741.tmp -L 1:10,000,000-10,010,000 -genotype    
[junit] ##### MD5 file is up to date: integrationtests/ab20d4953b13c3fc3060d12c7c6fe29d.integrationtest    
[junit] Checking MD5 for /tmp/walktest.tmp_param.05997727998894311741.tmp [calculated=ab20d4953b13c3fc3060d12c7c6fe29d, expected=0ac7ab893a3f550cb1b8c34f28baedf6]    
[junit] ##### Test testParameter[-genotype] is going fail #####    
[junit] ##### Path to expected   file (MD5=0ac7ab893a3f550cb1b8c34f28baedf6): integrationtests/0ac7ab893a3f550cb1b8c34f28baedf6.integrationtest    
[junit] ##### Path to calculated file (MD5=ab20d4953b13c3fc3060d12c7c6fe29d): integrationtests/ab20d4953b13c3fc3060d12c7c6fe29d.integrationtest    
[junit] ##### Diff command: diff integrationtests/0ac7ab893a3f550cb1b8c34f28baedf6.integrationtest integrationtests/ab20d4953b13c3fc3060d12c7c6fe29d.integrationtest

Examining the diff we see a few lines that have changed the DP count in the new code

> diff integrationtests/0ac7ab893a3f550cb1b8c34f28baedf6.integrationtest integrationtests/ab20d4953b13c3fc3060d12c7c6fe29d.integrationtest  | head
385,387c385,387
< 1     10000345        .       A       .       106.54  .       AN=2;DP=33;Dels=0.00;MQ=89.17;MQ0=0;SB=-10.00   GT:DP:GL:GQ     0/0:25:-0.09,-7.57,-75.74:74.78
< 1     10000346        .       A       .       103.75  .       AN=2;DP=31;Dels=0.00;MQ=88.85;MQ0=0;SB=-10.00   GT:DP:GL:GQ     0/0:24:-0.07,-7.27,-76.00:71.99
< 1     10000347        .       A       .       109.79  .       AN=2;DP=31;Dels=0.00;MQ=88.85;MQ0=0;SB=-10.00   GT:DP:GL:GQ     0/0:26:-0.05,-7.85,-84.74:78.04
---
> 1     10000345        .       A       .       106.54  .       AN=2;DP=32;Dels=0.00;MQ=89.50;MQ0=0;SB=-10.00   GT:DP:GL:GQ     0/0:25:-0.09,-7.57,-75.74:74.78
> 1     10000346        .       A       .       103.75  .       AN=2;DP=30;Dels=0.00;MQ=89.18;MQ0=0;SB=-10.00   GT:DP:GL:GQ     0/0:24:-0.07,-7.27,-76.00:71.99
> 1     10000347        .       A       .       109.79  .       AN=2;DP=30;Dels=0.00;MQ=89.18;MQ0=0;SB=-10.00   GT:DP:GL:GQ     0/0:26:-0.05,-7.85,-84.74:78

Whether this is the expected change is up to you to decide, but the system makes it as easy as possible to see the consequences of your code change.

7. Testing for Exceptions

The walker test framework supports an additional syntax for ensuring that a particular java Exception is thrown when a walker executes using a simple alternate version of the WalkerSpec object. Rather than specifying the MD5 of the result, you can provide a single subclass of Exception.class and the testing framework will ensure that when the walker runs an instance (class or subclass) of your expected exception is thrown. The system also flags if no exception is thrown.

For example, the following code tests that the GATK can detect and error out when incompatible VCF and FASTA files are given:

@Test public void fail8() { executeTest("hg18lex-v-b36", test(lexHG18, callsB36)); }

private WalkerTest.WalkerTestSpec test(String ref, String vcf) {
    return new WalkerTest.WalkerTestSpec("-T VariantsToTable -M 10 -B:two,vcf "
            + vcf + " -F POS,CHROM -R "
            + ref +  " -o %s",
            1, UserException.IncompatibleSequenceDictionaries.class);

}

During the integration test this looks like:

[junit] Executing test hg18lex-v-b36 with GATK arguments: -T VariantsToTable -M 10 -B:two,vcf /humgen/gsa-hpprojects/GATK/data/Validation_Data/lowpass.N3.chr1.raw.vcf -F POS,CHROM -R /humgen/gsa-hpprojects/GATK/data/Validation_Data/lexFasta/lex.hg18.fasta -o /tmp/walktest.tmp_param.05541601616101756852.tmp -l WARN -et NO_ET
[junit]    [junit] Wanted exception class org.broadinstitute.sting.utils.exceptions.UserException$IncompatibleSequenceDictionaries, saw class org.broadinstitute.sting.utils.exceptions.UserException$IncompatibleSequenceDictionaries
[junit]   => hg18lex-v-b36 PASSED

8. Miscellaneous information

  • Please do not put any extremely long tests in the regular ant build test target. We are currently splitting the system into fast and slow tests so that unit tests can be run in \< 3 minutes while saving a test target for long-running regression tests. More information on that will be posted.

  • An expected MG5 string of "" means don't check for equality between the calculated and expected MD5s. Useful if you are just writing a new test and don't know the true output.

  • Overload parameterize() { return true; } if you want the system to just run your calculations, not throw an error if your MD5s don't match, across all tests

  • If your tests all of a sudden stop giving equality MD5s, you can just (1) look at the .tmp output files directly or (2) grab the printed GATK command-line options and explore what is happening.

  • You can always run a GATK walker on the command line and then run md5sum on its output files to obtain, outside of the testing framework, the MD5 expected results.

  • Don't worry about the duplication of lines in the output ; it's just an annoyance of having two global loggers. Eventually we'll bug fix this away.

Comments (0)

1. What is DiffEngine?

DiffEngine is a summarizing difference engine that allows you to compare two structured files -- such as BAMs and VCFs -- to find what are the differences between them. This is primarily useful in regression testing or optimization, where you want to ensure that the differences are those that you expect and not any others.

2. The summarized differences

The GATK contains a summarizing difference engine called DiffEngine that compares hierarchical data structures to emit:

  • A list of specific differences between the two data structures. This is similar to saying the value in field A in record 1 in file F differs from the value in field A in record 1 in file G.

  • A summarized list of differences ordered by frequency of the difference. This output is similar to saying field A differed in 50 records between files F and G.

3. The DiffObjects walker

The GATK contains a private walker called DiffObjects that allows you access to the DiffEngine capabilities on the command line. Simply provide the walker with the master and test files and it will emit summarized differences for you.

4. Understanding the output

The DiffEngine system compares to two hierarchical data structures for specific differences in the values of named nodes. Suppose I have two trees:

Tree1=(A=1 B=(C=2 D=3)) 
Tree2=(A=1 B=(C=3 D=3 E=4))
Tree3=(A=1 B=(C=4 D=3 E=4))

where every node in the tree is named, or is a raw value (here all leaf values are integers). The DiffEngine traverses these data structures by name, identifies equivalent nodes by fully qualified names (Tree1.A is distinct from Tree2.A, and determines where their values are equal (Tree1.A=1, Tree2.A=1, so they are).

These itemized differences are listed as:

Tree1.B.C=2 != Tree2.B.C=3
Tree1.B.C=2 != Tree3.B.C=4
Tree2.B.C=3 != Tree3.B.C=4
Tree1.B.E=MISSING != Tree2.B.E=4

This conceptually very similar to the output of the unix command line tool diff. What's nice about DiffEngine though is that it computes similarity among the itemized differences and displays the count of differences names in the system. In the above example, the field C is not equal three times, while the missing E in Tree1 occurs only once. So the summary is:

*.B.C : 3
*.B.E : 1

where the * operator indicates that any named field matches. This output is sorted by counts, and provides an immediate picture of the commonly occurring differences between the files.

Below is a detailed example of two VCF fields that differ because of a bug in the AC, AF, and AN counting routines, detected by the integrationtest integration (more below). You can see that in the although there are many specific instances of these differences between the two files, the summarized differences provide an immediate picture that the AC, AF, and AN fields are the major causes of the differences.

[testng] path                                                              count
[testng] *.*.*.AC                                                         6
[testng] *.*.*.AF                                                         6
[testng] *.*.*.AN                                                         6
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000000.AC  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000000.AF  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000000.AN  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000117.AC  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000117.AF  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000117.AN  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000211.AC  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000211.AF  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000211.AN  1
[testng] 64b991fd3850f83614518f7d71f0532f.integrationtest.20:10000598.AC  1

5. Integration tests

The DiffEngine codebase that supports these calculations is integrated into the integrationtest framework, so that when a test fails the system automatically summarizes the differences between the master MD5 file and the failing MD5 file, if it is an understood type. When failing you will see in the integration test logs not only the basic information, but the detailed DiffEngine output.

For example, in the output below I broke the GATK BAQ calculation and the integration test DiffEngine clearly identifies that all of the records differ in their BQ tag value in the two BAM files:

/humgen/1kg/reference/human_b36_both.fasta -I /humgen/gsa-hpprojects/GATK/data/Validation_Data/NA12878.1kg.p2.chr1_10mb_11_mb.allTechs.bam -o /var/folders/Us/UsMJ3xRrFVyuDXWkUos1xkC43FQ/-Tmp-/walktest.tmp_param.05785205687740257584.tmp -L 1:10,000,000-10,100,000 -baq RECALCULATE -et NO_ET
   [testng] WARN  22:59:22,875 TextFormattingUtils - Unable to load help text.  Help output will be sparse.
   [testng] WARN  22:59:22,875 TextFormattingUtils - Unable to load help text.  Help output will be sparse.
   [testng] ##### MD5 file is up to date: integrationtests/e5147656858fc4a5f470177b94b1fc1b.integrationtest
   [testng] Checking MD5 for /var/folders/Us/UsMJ3xRrFVyuDXWkUos1xkC43FQ/-Tmp-/walktest.tmp_param.05785205687740257584.tmp [calculated=e5147656858fc4a5f470177b94b1fc1b, expected=4ac691bde1ba1301a59857694fda6ae2]
   [testng] ##### Test testPrintReadsRecalBAQ is going fail #####
   [testng] ##### Path to expected   file (MD5=4ac691bde1ba1301a59857694fda6ae2): integrationtests/4ac691bde1ba1301a59857694fda6ae2.integrationtest
   [testng] ##### Path to calculated file (MD5=e5147656858fc4a5f470177b94b1fc1b): integrationtests/e5147656858fc4a5f470177b94b1fc1b.integrationtest
   [testng] ##### Diff command: diff integrationtests/4ac691bde1ba1301a59857694fda6ae2.integrationtest integrationtests/e5147656858fc4a5f470177b94b1fc1b.integrationtest
   [testng] ##:GATKReport.v0.1 diffences : Summarized differences between the master and test files.
   [testng] See http://www.broadinstitute.org/gsa/wiki/index.php/DiffObjectsWalker_and_SummarizedDifferences for more information
   [testng] Difference                                                                               NumberOfOccurrences
   [testng] *.*.*.BQ                                                                                 895
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:2:266:272:361.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:5:245:474:254.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:5:255:178:160.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:6:158:682:495.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:6:195:591:884.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:7:165:236:848.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:7:191:223:910.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAE_0002_FC205W7AAXX:7:286:279:434.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAF_0002_FC205Y7AAXX:2:106:516:354.BQ  1
   [testng] 4ac691bde1ba1301a59857694fda6ae2.integrationtest.-XAF_0002_FC205Y7AAXX:3:102:580:518.BQ  1
   [testng]
   [testng] Note that the above list is not comprehensive.  At most 20 lines of output, and 10 specific differences will be listed.  Please use -T DiffObjects -R public/testdata/exampleFASTA.fasta -m integrationtests/4ac691bde1ba1301a59857694fda6ae2.integrationtest -t integrationtests/e5147656858fc4a5f470177b94b1fc1b.integrationtest to explore the differences more freely

6. Adding your own DiffableObjects to the system

The system dynamically finds all classes that implement the following simple interface:

public interface DiffableReader {
    @Ensures("result != null")
    /**
     * Return the name of this DiffableReader type.  For example, the VCF reader returns 'VCF' and the
     * bam reader 'BAM'
     */
    public String getName();

    @Ensures("result != null")
    @Requires("file != null")
    /**
     * Read up to maxElementsToRead DiffElements from file, and return them.
     */
    public DiffElement readFromFile(File file, int maxElementsToRead);

    /**
     * Return true if the file can be read into DiffElement objects with this reader. This should
     * be uniquely true/false for all readers, as the system will use the first reader that can read the
     * file.  This routine should never throw an exception.  The VCF reader, for example, looks at the
     * first line of the file for the ##format=VCF4.1 header, and the BAM reader for the BAM_MAGIC value
     * @param file
     * @return
     */
    @Requires("file != null")
    public boolean canRead(File file);

See the VCF and BAMDiffableReaders for example implementations. If you extend this to a new object types both the DiffObjects walker and the integrationtest framework will automatically work with your new file type.

Comments (12)

Objective

Test that Queue is correctly installed, and that the supporting tools like Java are in your path.

Prerequisites

  • Basic familiarity with the command-line environment
  • Understand what is a PATH variable
  • GATK installed
  • Queue downloaded and placed on path

Steps

  1. Invoke the Queue usage/help message
  2. Troubleshooting

1. Invoke the Queue usage/help message

The command we're going to run is a very simple command that asks Queue to print out a list of available command-line arguments and options. It is so simple that it will ALWAYS work if your Queue package is installed correctly.

Note that this command is also helpful when you're trying to remember something like the right spelling or short name for an argument and for whatever reason you don't have access to the web-based documentation.

Action

Type the following command:

java -jar <path to Queue.jar> --help

replacing the <path to Queue.jar> bit with the path you have set up in your command-line environment.

Expected Result

You should see usage output similar to the following:

usage: java -jar Queue.jar -S <script> [-jobPrefix <job_name_prefix>] [-jobQueue <job_queue>] [-jobProject <job_project>]
       [-jobSGDir <job_scatter_gather_directory>] [-memLimit <default_memory_limit>] [-runDir <run_directory>] [-tempDir
       <temp_directory>] [-emailHost <emailSmtpHost>] [-emailPort <emailSmtpPort>] [-emailTLS] [-emailSSL] [-emailUser
       <emailUsername>] [-emailPass <emailPassword>] [-emailPassFile <emailPasswordFile>] [-bsub] [-run] [-dot <dot_graph>]
       [-expandedDot <expanded_dot_graph>] [-startFromScratch] [-status] [-statusFrom <status_email_from>] [-statusTo
       <status_email_to>] [-keepIntermediates] [-retry <retry_failed>] [-l <logging_level>] [-log <log_to_file>] [-quiet]
       [-debug] [-h]

 -S,--script <script>                                                      QScript scala file
 -jobPrefix,--job_name_prefix <job_name_prefix>                            Default name prefix for compute farm jobs.
 -jobQueue,--job_queue <job_queue>                                         Default queue for compute farm jobs.
 -jobProject,--job_project <job_project>                                   Default project for compute farm jobs.
 -jobSGDir,--job_scatter_gather_directory <job_scatter_gather_directory>   Default directory to place scatter gather
                                                                           output for compute farm jobs.
 -memLimit,--default_memory_limit <default_memory_limit>                   Default memory limit for jobs, in gigabytes.
 -runDir,--run_directory <run_directory>                                   Root directory to run functions from.
 -tempDir,--temp_directory <temp_directory>                                Temp directory to pass to functions.
 -emailHost,--emailSmtpHost <emailSmtpHost>                                Email SMTP host. Defaults to localhost.
 -emailPort,--emailSmtpPort <emailSmtpPort>                                Email SMTP port. Defaults to 465 for ssl,
                                                                           otherwise 25.
 -emailTLS,--emailUseTLS                                                   Email should use TLS. Defaults to false.
 -emailSSL,--emailUseSSL                                                   Email should use SSL. Defaults to false.
 -emailUser,--emailUsername <emailUsername>                                Email SMTP username. Defaults to none.
 -emailPass,--emailPassword <emailPassword>                                Email SMTP password. Defaults to none. Not
                                                                           secure! See emailPassFile.
 -emailPassFile,--emailPasswordFile <emailPasswordFile>                    Email SMTP password file. Defaults to none.
 -bsub,--bsub_all_jobs                                                     Use bsub to submit jobs
 -run,--run_scripts                                                        Run QScripts.  Without this flag set only
                                                                           performs a dry run.
 -dot,--dot_graph <dot_graph>                                              Outputs the queue graph to a .dot file.  See:
                                                                           http://en.wikipedia.org/wiki/DOT_language
 -expandedDot,--expanded_dot_graph <expanded_dot_graph>                    Outputs the queue graph of scatter gather to
                                                                           a .dot file.  Otherwise overwrites the
                                                                           dot_graph
 -startFromScratch,--start_from_scratch                                    Runs all command line functions even if the
                                                                           outputs were previously output successfully.
 -status,--status                                                          Get status of jobs for the qscript
 -statusFrom,--status_email_from <status_email_from>                       Email address to send emails from upon
                                                                           completion or on error.
 -statusTo,--status_email_to <status_email_to>                             Email address to send emails to upon
                                                                           completion or on error.
 -keepIntermediates,--keep_intermediate_outputs                            After a successful run keep the outputs of
                                                                           any Function marked as intermediate.
 -retry,--retry_failed <retry_failed>                                      Retry the specified number of times after a
                                                                           command fails.  Defaults to no retries.
 -l,--logging_level <logging_level>                                        Set the minimum level of logging, i.e.
                                                                           setting INFO get's you INFO up to FATAL,
                                                                           setting ERROR gets you ERROR and FATAL level
                                                                           logging.
 -log,--log_to_file <log_to_file>                                          Set the logging location
 -quiet,--quiet_output_mode                                                Set the logging to quiet mode, no output to
                                                                           stdout
 -debug,--debug_mode                                                       Set the logging file string to include a lot
                                                                           of debugging information (SLOW!)
 -h,--help                                                                 Generate this help message

If you see this message, your Queue installation is ok. You're good to go! If you don't see this message, and instead get an error message, proceed to the next section on troubleshooting.


2. Troubleshooting

Let's try to figure out what's not working.

Action

First, make sure that your Java version is at least 1.6, by typing the following command:

java -version

Expected Result

You should see something similar to the following text:

java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)  

Remedial actions

If the version is less then 1.6, install the newest version of Java onto the system. If you instead see something like

java: Command not found  

make sure that java is installed on your machine, and that your PATH variable contains the path to the java executables.

On a Mac running OS X 10.5+, you may need to run /Applications/Utilities/Java Preferences.app and drag Java SE 6 to the top to make your machine run version 1.6, even if it has been installed.

No posts found with the requested search criteria.
No posts found with the requested search criteria.