Registration is now LIVE for our upcoming BroadE Workshop: Best Practices for Variant Calling with the GATK.
WHEN: Thursday, March 19 & Friday, March 20, 2015
10:00 AM - 5:00 PM (Lecture, March 19)
2:00PM - 5:00 PM (Optional Tutorial, March 20)
WHERE: Broad Institute
Auditorium (lecture)/Yellowstone (Tutorial)
415 Main Street
Cambridge, Massachusetts 02142
Registration closes February 27 at 5:00 PM.
Notification of acceptance or wait list status sent by March 4.
In case you were wondering why responses on the forum have been slow... We've been dealing with this crap.
Fig. 1: GATK support technician reassigned to fire hydrant clearing duty, monitoring the forum while waiting enthusiastically for the next round.
Remember kids, keeping fire hydrants clear of snow saves lives. Also, the plow truck is both ally and enemy in this fight. Who do you think keeps piling snow on the poor defenseless hydrant at the end of the street? Hate the plow truck. Love the plow truck. I'm so confused.
We're going to be doing two back-to-back workshops in Edinburgh and Cambridge (the original, accept no substitutes) later this Spring, on April 20-21 and 23-24 respectively. The workshop program for both will be our typical one-day Best Practices lectures marathon followed by a half-day of lectures on supplemental topics (QC, non-humans, etc) and a half-day hands-on sessions for beginners to get their hands dirty with some real data.
Cheers to our hosts and we hope to see lots of you there!
bonus points to whoever gets the title reference -- and sings it in the correct tune
So you may have heard the US Northeast is getting a little bit of snow. Here's what it looked like this morning at GATK Support HQ, during a relative lull that allowed me to do a first round of clearing:
Not too bad so far, but it looks like it's going to get worse before it gets better. Round two is going to suck.
Anyway, the Broad is shut down for the duration of the state of emergency, and we are all at home waiting out the snowpocalypse. The GATK forum will be mostly unattended while we hunker down and sip hot cocoa with marshmallows. Assuming the power stays on and we're able to dig ourselves out of the snow when it's all over, normal service should resume by end of day Wednesday.
It's a shiny New Year and the forum, like the rest of Broad, is back to active status, so bring it on! It might take us a day or two to mop up the questions that came in during the break so we appreciate your patience as always (although thanks to superuser @pdexheimer there's a bunch that are already resolved, yay).
In the next few days we'll hopefully have some hot new announcements for you, so please keep on eye on this space.
Despite the conspicuous lack of snow in the Boston area, my calendar insists that it's time for our annual winter/holiday break, which sees (almost) the entire Broad shut down from December 24th to January 2nd (included). So, starting tonight (5 pm EST), the forum will be unattended until we resume normal service on Monday January 5th of the new year, 2015.
It's been a busy end of year and we didn't get to quite the end of our service queue, so I apologize to those of you who may still have pending questions or requests. The amount of people using our tools -- and the diversity of applications they're being used for (hello bizarre non-model organisms) -- just keep increasing, which is a source of great joy for us but also of significant challenges. We've got some changes coming that should help us meet that ever-growing demand more efficiently in future. In the meantime though, we're really grateful for the patience and trust that you have all been showing us as we figure out our way through this fairly new territory.
We've also got some big changes coming in terms of the GATK itself, so please stay tuned for an announcement (or series of announcements) early in the new year. It's hard not to spill the beans so I'm not even going to drop any hints, but I can tell you it's going to be really exciting ;)
With that, I'll leave you to your holiday preparations, because you couldn't possibly still be working, right? Why are you even reading this? Shoo, go away! Go make a snowman, play with your kids, call your folks to say hi, or something like that. We'll see you next year!
P.S. If you live in a part of the world where it's not a holiday season, um, sorry? But hey, the "go play with your kids/ call your folks to say hi" exhortation is applicable at any time of year, really.
The Best Practices recommendations for Variant Quality Score Recalibration have been slightly updated to use the new(ish) StrandOddsRatio (SOR) annotation, which complements FisherStrand (FS) as indicator of strand bias (only available in GATK version 3.3-0 and above).
While we were at it we also reconciled some inconsistencies between the tutorial and the FAQ document. As a reminder, if you ever find differences between parameters given in the VQSR docs, let us know, but FYI that the FAQ is the ultimate source of
truth=true. Note also that the command line example given in VariantRecalibrator tool doc tends to be out of date because it can only be updated with the next release (due to a limitation of the tool doc generation system) and, well, we often forget to do it in time -- so it should never be used as a reference for Best Practice parameter values, as indicated in the caveat right underneath it which no one ever reads.
Speaking of caveats, there's no such thing as too much repetition of the fact that whole genomes and exomes have subtle differences that require some tweaks to your command lines. In the case of VQSR, that means dropping Coverage (DP) from your VQSR command lines if you're working with exomes.
Finally, keep in mind that the values we recommend for tranches are really just examples; if there's one setting you should freely experiment with, that's the one. You can specify as many tranche cuts as you want to get really fine resolution.
Warning: the following content may shock or distress our more sensitive users, as we discuss the cold-blooded elimination of some tools from the GATK.
Alright, now that I've got your attention (hopefully -- if not, what does it take?), here's the deal. We have got to a point where the GATK is a widely, even massively used toolkit (thanks to you, dear users). And it's pretty darn robust -- it's what the Broad's Genomic Platform uses in production to churn out exomes like there's no tomorrow. But it has technical limitations that are 1) a frequent source of pain on your end and 2) increasingly hampering development of new methods on our end.
The good news is that we have a plan for addressing (read: blasting away) these limitations. But part of this plan will involve streamlining GATK by getting rid of tools that are not useful or are inferior to alternative tools from other packages that we're not trying to compete with (e.g. Picard tools).
Some tools that are safe from elimination: all the tools used in the Best Practices, and a couple of utilities that we use a lot ourselves. But everything else is up for review -- and that's where you come in: we need your input to decide what to keep, what to throw away, and what to consider rewriting from scratch (yep, this is an option).
This link will take you to a SurveyMonkey page that lists the tools currently on the chopping block:
Act now to save your favorite non-BP tools! Or help us get rid of the crud. Whichever way you want to look at it, we appreciate your feedback!
We're advertising this job on behalf of our colleagues in the cancer analysis team. See the overview below the fold. If you're interested, please message me (@Geraldine_VdAuwera) or apply on the Broad's careers page (search for job requisition number 1591).
We all know how HaplotypeCaller analyses can take a long time. IBM is now providing a native implementation of the PairHMM algorithm that leverages the new hardware available in their POWER8 systems. This optimization currently work on the following systems: Ubuntu14 and RHEL7 with POWER8.
To take advantage of this optimization, you need to do the following:
Here is an example for running on a P8 system with Ubuntu:
java -Xmx32g -Djava.library.path=/path/to/PairHMM_P8_Ubuntu -jar $GATK_PATH/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R $REFERENCE -I $INPUT_BAM --dbsnp $SNP_VCF \ -stand_emit_conf 10 -stand_call_conf 50 \ --pair_hmm_implementation VECTOR_LOGLESS_CACHING \ -o $OUTPUT_VCF
You can expect a speedup in the range of 1-1.7x depending on the hardware, OS and test cases.
If you have any questions or issues (aside from downloading the file), please contact Yinhue Cheng at IBM (firstname.lastname@example.org).