Until now, HaplotypeCaller was only capable of calling variants in diploid organisms due to some assumptions made in the underlying algorithms. I'm happy to announce that we now have a generalized version that is capable of handling any ploidy you specify at the command line!
This new feature, which we're calling "omniploidy", is technically still under development, but we think it's mature enough for the more adventurous to try out as a beta test ahead of the next official release. We'd especially love to get some feedback from people who work with non-diploids on a regular basis, so we're hoping that some of you microbiologists and assorted plant scientists will take it out for a spin and let us know how it behaves in your hands.
It's available in the latest nightly builds; just use the
-ploidy argument to give it a whirl. If you have any questions or feedback, please post a comment on this article in the forum.
Caveat: the downstream tools involved in the new GVCF-based workflow (GenotypeGVCFs and CombineGVCFs) are not yet capable of handling non-diploid calls correctly -- but we're working on it.
Our partners at Appistry will be doing a webinar on RNAseq analysis next Thursday. The webinar will include a live presentation of the complete pipeline for RNAseq analysis, as well as question time open to all participants. As usual it's free and open to all, you just need to register at Appistry's website. Check it out!
Here's my abstract for the upcoming Genome Science UK meeting in Oxford, where I'll be talking about our hot new workflow for variant discovery. The slide deck will be posted in the Presentations section as usual after the conference.
Better late than never (right?), here are the version highlights for GATK 3.2. Overall, this release is essentially a collection of bug fixes and incremental improvements that we wanted to push out to not keep folks waiting while we're working on the next big features. Most of the bug fixes are related to the HaplotypeCaller and its "reference confidence model" mode (which you may know as
-ERC GVCF). But there are also a few noteworthy improvements/changes in other tools which I'll go over below.
Creative minds, your attention please -- our licensing partners, Appistry, are holding a competition! From the Appistry Pipeline Challenge webpage:
The Appistry Pipeline Challenge will reward and support one winning proposal for a creative pipeline that will make a difference in clinical research and precision medicine. We’ll provide the winner with a complete NGS analysis package valued at $70,000 including commercial-grade bioinformatics tools for variant calling and somatic mutation analysis, software and hardware for developing and executing pipelines at scale, and a year’s worth of support so that you can make your idea a reality.
There's more detailed information on the contest webpage of course, and if that doesn't answer all your questions, Appistry is holding a live webinar tomorrow (Thursday 7/24) to give additional details and answer questions.
The deadline for submissions is August 15th, so if you have an idea, better get on it!
GATK 3.2 was released on July 14, 2014. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history
Ladies, gentlemen and everyone else (this is a judgment-free zone), it's officially summertime in the norther hemisphere. Depending on who and where you are, this can mean no more classes, no more exams, and more quality time with your loved ones -- or extra expense getting someone to keep your offspring out of your way (hello summer camp). It is that hallowed time of year when academics put down the burdens of teaching and administrative duties and can finally get some science done. For many, it also means conference season, e.g. meeting up in Spain with a bunch of colleagues to argue about obscure methodological details over many a glass of tinto de verano. It's a hard, hard life.
A group of us just got back from sunny Belgium* where we held a GATK workshop at the invitation of the Royal Institute for Natural Sciences in Brussels. Now we're looking ahead to the next big dates on the horizon, and I thought I'd share them here in case some of you can join us. Or in case you would like to invite us over to give talks or workshops... (seriously, private-message me if you're interested in hosting a GATK workshop).
* This is not irony, it was really beautiful the whole time. Until the two non-Belgians left, and then boom! Downpour for three days. Typical.
As you can see below, our dance card is all clear for the summer itself but starting September it gets pretty busy.
The presentation slides are available on DropBox at this link:
After the workshop, these materials as well as the hands-on tutorial will be posted in the Presentations section of the website.
We discovered today that we made an error in the documentation article that describes the RNAseq Best Practices workflow. The error is not critical but is likely to cause an increased rate of False Positive calls in your dataset.
The error was made in the description of the "Split & Trim" pre-processing step. We originally wrote that you need to reassign mapping qualities to 60 using the ReassignMappingQuality read filter. However, this causes all MAPQs in the file to be reassigned to 60, whereas what you want to do is reassign MAPQs only for good alignments which STAR identifies with MAPQ 255. This is done with a different read filter, called ReassignOneMappingQuality. The correct command is therefore:
java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
In our hands we see a bump in the rate of FP calls from 4% to 8% when the wrong filter is used. We don't see any significant amount of false negatives (lost true positives) with the bad command, although we do see a few more true positives show up in the results of the bad command. So basically the effect is to excessively increase sensitivity, at the expense of specificity, because poorly mapped reads are taken into account with a "good" mapping quality, where they would normally be discarded.
This effect will be stronger in datasets with lower overall quality, so your results may vary. Let us know if you observe any really dramatic effects, but we don't expect that to happen.
To be clear, we do recommend re-processing your data if you can, but if that is not an option, keep in mind how this affects the rate of false positive discovery in your data.
We apologize for this error (which has now been corrected in the documentation) and for the inconvenience it may cause you.
Calling all Belgians! (and immediate neighbors)
In case you didn't hear of this through your local institutions, I'm excited to announce that we are doing a GATK workshop in Belgium in two weeks (June 24-26 to be precise). The workshop, which is open and free to the scientific community, will be held at the Royal Institute of Natural Sciences in Brussels.
This is SUPER EXCITING to me because as a small child I spent many hours drooling in front of the Institute Museum's stunningly beautiful Iguanodons, likely leaving grubby handprints all over the glass cases, to the shame and annoyance of my parents. I also happen to have attended the Lycee Emile Jacqmain which is located in the same park, right next to the Museum (also within a stone's throw of the more recently added European Parliament) so for me this is a real trip into the past. Complete with dinosaurs!
That said, I expect you may find this workshop exciting for very different reasons, such as learning how the GATK can empower your research and hearing about the latest cutting-edge developments that you can expect for version 3.2.
See this website or the attached flyer for practical details (but note that the exact daily program may be slightly different than announced due to the latest changes in GATK) and be sure to register (it's required for admission!) by emailing cvangestel at naturalsciences.be with your name and affiliation.
Please note that the hands-on sessions (to be held on the third day) are already filled to capacity. The tutorial materials will be available on our website in the days following the workshop.