Better late than never, here is the now-traditional "Highlights" document for GATK version 3.0, which was released two weeks ago. It will be a very short one since we've already gone over the new features in detail in separate articles --but it's worth having a recap of everything in one place. So here goes.
We are delighted to present our new Best Practices workflow for variant calling in which multisample calling is replaced by a winning combination of single-sample calling in gVCF mode and joint genotyping analysis. This allows us to both bypass performance issues and solve the so-called "N+1 problem" in one fell swoop. For full details of why and how this works, please see this document. In the near future, we will update our Best Practices page to make it clear that the new workflow is now the recommended way to go for calling variants on cohorts of samples. We've already received some pretty glowing feedback from early adopters, so be sure to try it out for yourself!
All the cool kids were doing it, so we had to join the party. It took a few months of experimentation, a couple of new tools and some tweaks to the HaplotypeCaller, but you can now call variants on RNAseq with GATK! This document details our Best Practices recommendations for doing so, along with a non-trivial number of caveats that you should keep in mind as you go.
Nice try, but no. This tool is obsolete now that we have the gVCF/reference model pipeline (see above). Note that this means that GATK 3.0 will not support BAM files that were processed using ReduceReads!
We've switched the build system from Ant to Maven, which should make it much easier to use GATK as a library against which you can develop your own tools. And on a related note, we're also making significant changes to the internal structure of the GATK codebase. Hopefully this will not have too much impact on external projects, but there will be a doc very shortly describing how the new build system works and how the codebase is structured.
For reasons that will be made clear in the near future, we decided to hold the previously announced hardware optimizations until version 3.1, which will be released very soon. Stay tuned!
The purpose of the ReduceReads algorithm was to enable joint analysis of large cohorts by the UnifiedGenotyper. The new workflow for joint discovery, which involves doing a single-sample pass with the HaplotypeCaller in gVCF mode followed by a joint analysis on multiple sample gVCFs, renders the compression step obsolete.
In addition, based on our most recent analyses, we have come to the conclusion that the quality of variant calls made on BAMs compressed with ReduceReads is inferior to the standard targeted by GATK tools. In comparison, the results obtained with the new workflow are far superior.
For these reasons, we have made the difficult decision to remove the ReduceReads tool from version 3.0 of the toolkit. To be clear, reduced BAMs will NOT be supported in GATK 3.0.
We realize that this may cause some disruption to your existing workflows, and for that we apologize. Please understand that we are driven to provide tools that produce the best possible results. Now that all the data is in, we have found that the best results cannot be achieved with reduced BAMs, so we feel that the best thing to do is to remove this inferior tool from the toolkit, and promote the new tools.
As always we welcome your comments, and we look forward to showing you how the new calling workflow will yield superior results.