Latest posts

The last GATK 3.x release of the year 2015 has arrived!

The major feature in GATK 3.5 is the eagerly awaited MuTect2 (beta version), which brings somatic SNP and Indel calling to GATK. This is just the beginning of GATK’s scope expansion into the somatic variant domain, so expect some exciting news about copy number variation in the next few weeks! Meanwhile, more on MuTect2 awesomeness below.

In addition, we’ve got all sorts of variant context annotation-related treats for you in the 3.5 goodie bag -- both new annotations and new capabilities for existing annotations, listed below.

In the variant manipulation space, we enhanced or fixed functionality in several tools including LeftAlignAndTrimVariants, FastaAlternateReferenceMaker and VariantEval modules. And in the variant calling/genotyping space, we’ve made some performance improvements across the board to HaplotypeCaller and GenotypeGVCFs (mostly by cutting out crud and making the code more efficient) including a few improvements specifically for haploids. Read the detailed release notes for more on these changes. Note that GenotypeGVCFs will now emit no-calls at sites where RGQ=0 in acknowledgment of the fact that those sites are essentially uncallable.

We’ve got good news for you if you’re the type who worries about disk space (whether by temperament or by necessity): we finally have CRAM support -- and some recommendations for keeping the output of BQSR down to reasonable file sizes, detailed below.

Finally, be sure to check out the detailed release notes for the usual variety show of minor features (including a new Queue job runner that enables local parallelism), bug fixes and deprecation notices (a few tools have been removed from the codebase, in the spirit of slimming down ahead of the holiday season).

Read the whole post
See comments (3)

GATK 3.5 was released on November 25, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.

Read the whole post
See comments (3)

The slide decks presented on Day 1 of the BroadE GATK workshop on November 18 are now available at this Google Drive link:

The workshop handout document (agenda and talk abstracts) is available at this link:

See comments (0)

We are scheduled to do 2 hands-on modules at the BroadE GATK workshop at the Broad Institute this Thursday, Nov. 19.

The tutorial materials for each module include the following:

  • Workshop dataset and an appendix document containing detailed instructions for preparing for the workshop
  • Tutorial worksheet containing the actual exercises done in the workshop

Attendees will receive a printout of the worksheets for the module to which they have registered. We will not provide printouts of the appendix documents.

If you are registered to attend, you must have downloaded the materials and followed the instructions before the workshop starts, otherwise you will not be able to follow along and your workshop experience will be unsatisfying. We certainly don't want that to happen, so be sure to do your homework as follows below the fold. Be sure to identify correctly which workshop you are registered for!

Read the whole post
See comments (5)

As announced recently, we are doing a workshop at the Broad Institute in Cambridge this coming November 18-19 (description below the fold).

Registration for the workshop is now live here.

Registration will close Oct. 30 (this Friday) so don't wait to sign up!

Read the whole post
See comments (0)

Yay, it's workshop planning season again!

USA: Cambridge, MA - November 18-19, 2015

Another local BroadE workshop at the Broad Institute for our Boston Metro Area peeps (and anyone who cares to travel to us on short notice). This will be the last workshop of 2015 and the first to cover the upcoming GATK version 3.5 -- which we really have to release soon now! As usual, the topic will be GATK Best Practices, with all the latest updates, some fresh material including two somatic variant discovery talks, and two completely revamped hands-on tutorial sessions. Registration will open on Monday, October 26 and stay open for one week. We'll post a link to the registration site when it goes live.

More below the fold: our preliminary workshop lineup for the 2016 Winter & Spring season.

Read the whole post
See comments (0)

Geraldine Van der Auwera presented this talk as part of the Broad Institute's Medical and Population Genetics (MPG) Primers series.

This talk provides a high-level overview of the workflow for performing variant discovery on high-throughput sequencing data, as described in the GATK Best Practices and implemented in the Broad's production pipelines.

The following points emphasized in this presentation are:

  • Informational content of data file formats and flow of information throughout the pipeline
  • Concepts involved in the data transformations (processing steps and analysis methods)
  • Motivation and key mechanics of the GVCF workflow for scalable joint variant discovery
  • Relation of the GATK Best Practices to the Broad's production pipeline implementation

The presentation slide deck is available at this link.

See comments (1)

Great poster session this morning at ASHG; Sheila and I got a lot of good questions about the Best Practices, and the GVCF workflow in particular. Our punchline: "This is how ExAC got done". It's super effective!

Preview of the poster after the fold. You can get the full-sized PDF here.

Read the whole post
See comments (0)

Are you excited? We sure are. Especially in the secondary sense defined by as "stimulated to activity; brisk:". We're presenting a poster on Thursday and a 90-minute workshop on Friday, but neither is ready yet. Good thing the weather this weekend is crappy; if we were missing out on proper New England fall foliage / leaf-peeping weather we'd be pretty cheesed off.

But we ain't afraid of no deadline -- we'll be ready. We developed a completely new workshop tutorial for the occasion, and we're going to have a big room full of people rocking GVCFs. It's going to be epic. The tutorial data bundle, sans worksheet (because that's the part that's not quite ready yet) is already available here (not a direct link to the data because we want you to read the part about the homework). It does have an appendix document with installation instructions and some context info about the tutorial objectives, which you must read through (and act on) before the workshop, if you're attending.

If you're at ASHG but you can't make it to the workshop, you can still come see our poster, which covers the same topic (the GVCF workflow part of the Best Practices), but flatter and less interactive. Although Sheila and I will be there to answer questions one on one, so in that sense it will be more interactive. Just with less keyboard action. So, Thursday 9 Oct between 12 and 1 pm at the Bioinformatics and Genomic Technology session in the Exhibit Hall, Level 1; Convention Center, poster #1664/T. Be there. We'll talk.

We'll also be around at the Broad Institute Genomic Services booth in the Exhibit Hall (booth #1720, right around the corner from Qiagen). Not sure yet when we'll be there, but send me a private message if you'd like to chat and we can figure out a time.

See you there!

See comments (4)

We are scheduled to do a hands-on workshop at the ASHG 2015 meeting in Baltimore (see below for program details).

The workshop files are available here:

Attendees will receive a printout of the worksheet at the workshop. We will not provide printouts of the appendix document.

If you are registered to attend, you must have downloaded the materials and followed the instructions before the workshop starts, otherwise you will not be able to follow along and your workshop experience will be unsatisfying. We certainly don't want that to happen, so be sure to do your homework as follows:

Get the files: 5 minutes

  • Download the 52 Mb zip file to the laptop you will bring to the workshop, and save it where you want the tutorial files to sit.
  • Open the zip file; this will create a directory called ASGH15_GATK.
  • Open the ASGH15_GATK directory, look at the contents and read the extremely brief README.txt file.

Refresh your memory of the topic and scientific context of the workshop: 10-20 minutes

  • Open the PDF document called ASHG2015Tutorial-Appendix.
  • Read the introduction. If any of the content is especially new or confusing to you, consider following the links included to the additional documentation on our website. We will cover this content briefly at the start of the tutorial, but the more you prepare, the better your workshop experience will be.

Install and test software: 30-45 minutes

Go over the second part of the ASHG2015Tutorial-Appendix document to acquaint yourself with the technical requirements of the workshop and follow the detailed installation instructions. In particular:

  • Make sure you have the correct version of Java installed (Java 7 / JRE 1.7).
  • Download all 3 software packages (GATK, Samtools, IGV) to the laptop you will bring to the workshop.
  • Copy the program files of all three to the ASGH15_GATK` directory.
  • Run the test commands for GATK and Samtools, and launch IGV.

If you are new to the command line or GATK, we strongly recommend reviewing the sections of the document about tool syntax (pages 8-10) and practicing basic Unix commands. You may also benefit from spending some time gaining familiarity with IGV.

Read the whole post
See comments (13)

Latest posts

At a glance

Search blog by tag

2013 acceleration ad agbt14 appistry ashg ashg2014 belgium best-practices beta blog brussels bug bug-fixed cancer catvariants challenge cloud combinegvcfs combinevariants commandline commandlinegatk commercial compbio competition conferences denovo depthofcoverage diagnosetargets downtime error fastaalternatereferencemaker fix gatk-3-0 gatk-3-2 gatk3 genotype genotypegvcfs genotyperefinement google gsa gvcf haploid haplotypecaller hardware hiring holiday htsjdk ibm job job-offer jobs joint-analysis joint-discovery key license media meetings mendelianviolations multisample multithreading mutect mutect2 nt pairhmm paper patch performance phone-home picard pipeline ploidy polyploid poster presentations press printreads promote queue randomlysplitvariants readbackedphasing reducereads reference-model release release-notes rnaseq search selectvariants service slides snow speed splitncigarreads status sting support syntax talks team third-party-tools topstory trivia troll tutorial unifiedgenotyper variantannotator variantrecalibrator variantstobinaryped version-highlights versions video videos vqsr webinar workshop

GATK Dev Team


@akiezun Brace yourselves, the beer is coming! @ewanbirney @chapmanb
25 Nov 15
@micknudsen Hmmmm no, not even close :D
25 Nov 15
@micknudsen Oh, you have a 10-week old baby too? ;)
25 Nov 15
Oh wait, @micknudsen got there first. You guys are fast! I'm calling timezone shenanigans.
25 Nov 15
Version 3.5 of #GATK now available, see highlights at and detailed release notes at
25 Nov 15