With my impeccable sense of timing, I am leaving for a 10-day vacation on the day we release the big new version.
To be blunt (because I am in a hurry to catch a plane): there will be no one assigned to answer forum questions. Other GATK team members may find it in their hearts to answer questions, but they have busy schedules, so don't count on it.
My apologies for the inconvenience! I will be back at my post on the 17th.
In the meantime, please enjoy the new release. I hope the new docs I posted today help you make sense of the new features. You can also have a look at the slide decks we presented this week at Mahidol University in Bangkok, Thailand; they contain more background on the new features (world premiere on the road, whoo) and are available in the GSA Dropbox.
Okay, we realize the name's a bit of a mouthful, and we're willing to tweak it if anyone has any good ideas. But never mind that. It's difficult to overstate the importance of this new approach to joint variant discovery (but I'll give it a shot) so we're really stoked to finally be able to share the details of how it's is going to work in practice.
You're probably going to be surprised at how simple it is in practice (not that it was particularly easy to actually build, mind you). The gory details are in the new document here, but here's an overview of how it looks within the Best Practices workflow you all know and (hopefully) love:
We’re excited to introduce our Best Practices recommendations for calling variants on RNAseq data. These recommendations are based on our classic DNA-focused Best Practices, with some key differences in the early data processing steps, as well as in the calling step.
[A sneak peek at the upcoming release notes]
GATK 3.0 will (hopefully) be released on March 5, 2014. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
One important change for those who prefer to build from source is that we now use maven instead of ant. See the relevant documentation for building the GATK with our new build system.
The purpose of the ReduceReads algorithm was to enable joint analysis of large cohorts by the UnifiedGenotyper. The new workflow for joint discovery, which involves doing a single-sample pass with the HaplotypeCaller in gVCF mode followed by a joint analysis on multiple sample gVCFs, renders the compression step obsolete.
In addition, based on our most recent analyses, we have come to the conclusion that the quality of variant calls made on BAMs compressed with ReduceReads is inferior to the standard targeted by GATK tools. In comparison, the results obtained with the new workflow are far superior.
For these reasons, we have made the difficult decision to remove the ReduceReads tool from version 3.0 of the toolkit. To be clear, reduced BAMs will NOT be supported in GATK 3.0.
We realize that this may cause some disruption to your existing workflows, and for that we apologize. Please understand that we are driven to provide tools that produce the best possible results. Now that all the data is in, we have found that the best results cannot be achieved with reduced BAMs, so we feel that the best thing to do is to remove this inferior tool from the toolkit, and promote the new tools.
As always we welcome your comments, and we look forward to showing you how the new calling workflow will yield superior results.
This week, three lucky GATK team members are teaching an invited workshop at Mahidol University in Bangkok, Thailand! The slide decks for each day will be available at the start of the day here in the GSA dropbox. After the workshop, all materials will be available in the Presentations section of the GATK website as usual.
When people sign up for an account on the forum, the signup form includes the following question:
"What does SNP stand for?"
This is simply one of many ways we try to fight forum spam and bogus accounts. We don't really require people to answer correctly, but it's surprisingly good at tripping up spam bots and even some human spammers (sad, sad creatures that they are).
Most people know or make a pretty good guess (allowing for occasional language issues); and on the other hand there are some folks who clearly don't have the time to go into such a complicated question (/snark) and simply mash the keyboard in irritation, yielding such enlightening responses as "fhstdgf" (with local variations due to language-specific keyboard layouts).
My favorite, though, are the really creative answers we get from the (presumed) sysadmins and assorted IT support staff who get tasked to deploy GATK to their institution's clusters and user workstations, but haven't got the foggiest what a SNP might be in our particular jargon-filled domain, and therefore propose their own interpretations.
Previously, we covered the spirit of GATK 3.0 (what our intentions are for this new release, and what we’re hoping to achieve). Let’s now have a look at the top three features you can look forward to in 3.0, in no particular order:
We're admittedly a little late bringing you the details of the GATK 3.0 features; our excuse is entirely meteorological. To whit, here is the view from GATK Support & Outreach HQ earlier this afternoon:
And the GATKmobile:
The lovely, very festive coating of crystalline H20 you see in these images unfortunately entails a significant amount of time spent shoveling, which equates to time not spent writing about the wondrous new features of the upcoming GATK release.
We'll do our best to catch up promptly, if the weather gods allow it.
If you're at AGBT 2014 and you have any questions about the upcoming GATK 3.0 release, be sure to find Mauricio Carneiro (@mauricinho on Twitter) and ask him all about it.
To make it easy (we hear it's a pretty big crowd) Mauricio will be in the BioTeam Suite # 186 (which is sponsored by Intel) on Friday 15th from 3:30 to 4pm. He'll be there specifically to represent the GATK dev team and answer questions about GATK 3.0, so don't be shy!
Mauricio is easy to identify: a friendly-looking guy with a ponytail. See his bio on the dev team page or his personal webpage for photos and more details about his background (can you believe he used to play Starcraft and Quake professionally?).