This Thursday, the team enjoyed an array of Belgian beers generously offered by Ewan Birney as reward for integration of CRAM support into GATK (version 3.5 and up). We feel the Birn!
Our thanks also to the Samtools team who led the way, Vadim Zalunin from EBI who contributed the lion's share of the code toward this effort, and various people besides who helped make this happen.
This week we are teaching a couple of GATK Best Practices workshops in Sydney (one Opera House, check) and Melbourne, Australia. Joining me are Methods Team developers Mark and Megan.
The presentation slides from Day 1 as well as the materials for the Day 2 hands-on tutorial are available at this Google Drive link.
Learned so far: the fierceness of the Australian sun is no laughing matter
Cancer pipelines, copy number variation, GATK in the Cloud... We have lots of exciting new features coming down the pipe, and you'll be hearing more about all of them in the weeks and months to come. There are one or two announcements that I guarantee will knock some socks off!
Yet in all that excitement (alright, I'm the excited one, you're just trying to see through the thick mist of vaporware I just spouted) let's not forget that we currently have in hand a toolkit that is used by thousands of you every day, containing dozens of tools that perform complicated operations on ridiculously large datasets.
Are they perfect? No. They're pretty darn good, if we do say so ourselves, but they have their quirks. Sometimes even bugs. Aah! The humanity!
So let's talk about problems. Let's talk about what trips people up and bogs down genomic research on a day-to-day basis. Setting aside the big stuff, like compute resources, scaling and performance blockers, which are all interesting topics for another time... Let's talk about all the insultingly small, aggravating, pulling-your-hair-out, this-should-work-why-isn't-this-stupid-thing-working kind of problems.
As the scope and user base of the GATK have expanded dramatically over the past few years, I've been empowered to put together a team that is dedicated exclusively to support and outreach activities (writing the docs, answering questions in the forum, organizing workshops and so on) for GATK and its close cousins such as Picard Tools. This takes most of the support burden off of the development team, allowing them to focus on improving and extending the GATK tools, while we work closely with them to provide a high level of support to everyone in our user community.
So I'm very pleased to introduce the members of this team as it stands in early 2016 (newest first). As you'll see we have a mix of backgrounds, skills and experiences that allow us to approach GATK from the point of view of users, rather than developers. In the days and weeks to come, as we flex our blogging muscles, each of us will write about our experiences with GATK from that point of view, highlighting pitfalls, solutions and useful tips & tricks. We hope this will be helpful and we look forward to feedback.
Without further ado, I give you the 2016 GATK support team:
It's unnaturally not-cold outside and ducks are frolicking on the Charles river instead of heading South, but despite that, I'm told it's time to shut down the shop and take a break for the holidays.
The Broad Institute mothership will be closed from tomorrow Wednesday 12/24 and reopen Monday 1/4 of the new year (that would be 2016). Accordingly, we GATK support elves will enjoy some rest and relaxation, and we hope that you will do the same (as applicable to your local culture, calendar and more or less amenable PIs).
In the meantime, be aware that the forum will be mostly unattended; there is no guarantee that any questions will be answered until January 4th. We will be back with great enthusiasm, many answers and possibly (definitely!) some exciting new announcements at the start of the new year.
Should you wish to send us a holiday card (aka the nicest possible way to make a feature request) please write:
Data Sciences And Data Engineering
75 Ames street, 7th floor
Cambridge, MA 02142
If you include a return address we'd be happy to return the courtesy.
Happy holidays to all!
The hands-on tutorial files and presentations slides from the Dec 8, 2015 workshop at VIB in Gent, Belgium are available at this link:
Please be advised that the forum will be unattended for the duration of the Thanksgiving holiday, 26-29 Nov. During that time, we'll only answer questions if we need an excuse to escape from in-laws or screaming babies (which is worse is a matter of some debate).
Regular service will resume Monday 30 Nov.
Thanksgiving message below the fold.
The last GATK 3.x release of the year 2015 has arrived!
The major feature in GATK 3.5 is the eagerly awaited MuTect2 (beta version), which brings somatic SNP and Indel calling to GATK. This is just the beginning of GATK’s scope expansion into the somatic variant domain, so expect some exciting news about copy number variation in the next few weeks! Meanwhile, more on MuTect2 awesomeness below.
In addition, we’ve got all sorts of variant context annotation-related treats for you in the 3.5 goodie bag -- both new annotations and new capabilities for existing annotations, listed below.
In the variant manipulation space, we enhanced or fixed functionality in several tools including LeftAlignAndTrimVariants, FastaAlternateReferenceMaker and VariantEval modules. And in the variant calling/genotyping space, we’ve made some performance improvements across the board to HaplotypeCaller and GenotypeGVCFs (mostly by cutting out crud and making the code more efficient) including a few improvements specifically for haploids. Read the detailed release notes for more on these changes. Note that GenotypeGVCFs will now emit no-calls at sites where RGQ=0 in acknowledgment of the fact that those sites are essentially uncallable.
We’ve got good news for you if you’re the type who worries about disk space (whether by temperament or by necessity): we finally have CRAM support -- and some recommendations for keeping the output of BQSR down to reasonable file sizes, detailed below.
Finally, be sure to check out the detailed release notes for the usual variety show of minor features (including a new Queue job runner that enables local parallelism), bug fixes and deprecation notices (a few tools have been removed from the codebase, in the spirit of slimming down ahead of the holiday season).
GATK 3.5 was released on November 25, 2015. Itemized changes are listed below. For more details, see the user-friendly version highlights.
These are the materials that were presented at the November 2015 GATK workshop at the Broad Institute in Cambridge, MA.
|Slide decks presented on Day 1||Google Drive Folder|
|Workshop handout document (agenda and talk abstracts)||PDF on Google Drive|
|Variant Discovery Tutorial (Day 2 AM) (=ASHG15 Tutorial)||Google Drive Folder|
|Callset Filtering and Evaluation Tutorial (Day 2 PM)||Google Drive Folder|