Appistry Pipeline Contest Jul 23

Creative minds, your attention please -- our licensing partners, Appistry, are holding a competition! From the Appistry Pipeline Challenge webpage:

The Appistry Pipeline Challenge will reward and support one winning proposal for a creative pipeline that will make a difference in clinical research and precision medicine. We’ll provide the winner with a complete NGS analysis package valued at $70,000 including commercial-grade bioinformatics tools for variant calling and somatic mutation analysis, software and hardware for developing and executing pipelines at scale, and a year’s worth of support so that you can make your idea a reality.

There's more detailed information on the contest webpage of course, and if that doesn't answer all your questions, Appistry is holding a live webinar tomorrow (Thursday 7/24) to give additional details and answer questions.

The deadline for submissions is August 15th, so if you have an idea, better get on it!

Release notes for GATK version 3.2 Jul 15

GATK 3.2 was released on July 14, 2014. Highlights are listed below. Read the detailed version history overview here:


Tue 15 Jul 2014

Meetings, workshops, conferences, oh my! (carbon footprint) Jul 2

Ladies, gentlemen and everyone else (this is a judgment-free zone), it's officially summertime in the norther hemisphere. Depending on who and where you are, this can mean no more classes, no more exams, and more quality time with your loved ones -- or extra expense getting someone to keep your offspring out of your way (hello summer camp). It is that hallowed time of year when academics put down the burdens of teaching and administrative duties and can finally get some science done. For many, it also means conference season, e.g. meeting up in Spain with a bunch of colleagues to argue about obscure methodological details over many a glass of tinto de verano. It's a hard, hard life.

A group of us just got back from sunny Belgium* where we held a GATK workshop at the invitation of the Royal Institute for Natural Sciences in Brussels. Now we're looking ahead to the next big dates on the horizon, and I thought I'd share them here in case some of you can join us. Or in case you would like to invite us over to give talks or workshops... (seriously, private-message me if you're interested in hosting a GATK workshop).

* This is not irony, it was really beautiful the whole time. Until the two non-Belgians left, and then boom! Downpour for three days. Typical.

As you can see below, our dance card is all clear for the summer itself but starting September it gets pretty busy.


Slides from the June 2014 GATK workshop in Brussels, Belgium Jun 25

The presentation slides are available on DropBox at this link:

After the workshop, these materials as well as the hands-on tutorial will be posted in the Presentations section of the website.

Documentation error in RNAseq workflow Jun 11

We discovered today that we made an error in the documentation article that describes the RNAseq Best Practices workflow. The error is not critical but is likely to cause an increased rate of False Positive calls in your dataset.

The error was made in the description of the "Split & Trim" pre-processing step. We originally wrote that you need to reassign mapping qualities to 60 using the ReassignMappingQuality read filter. However, this causes all MAPQs in the file to be reassigned to 60, whereas what you want to do is reassign MAPQs only for good alignments which STAR identifies with MAPQ 255. This is done with a different read filter, called ReassignOneMappingQuality. The correct command is therefore:

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R ref.fasta -I dedupped.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

In our hands we see a bump in the rate of FP calls from 4% to 8% when the wrong filter is used. We don't see any significant amount of false negatives (lost true positives) with the bad command, although we do see a few more true positives show up in the results of the bad command. So basically the effect is to excessively increase sensitivity, at the expense of specificity, because poorly mapped reads are taken into account with a "good" mapping quality, where they would normally be discarded.

This effect will be stronger in datasets with lower overall quality, so your results may vary. Let us know if you observe any really dramatic effects, but we don't expect that to happen.

To be clear, we do recommend re-processing your data if you can, but if that is not an option, keep in mind how this affects the rate of false positive discovery in your data.

We apologize for this error (which has now been corrected in the documentation) and for the inconvenience it may cause you.

GATK workshop in Brussels, Belgium: June 24-26 (2014) Jun 11

Calling all Belgians! (and immediate neighbors)

In case you didn't hear of this through your local institutions, I'm excited to announce that we are doing a GATK workshop in Belgium in two weeks (June 24-26 to be precise). The workshop, which is open and free to the scientific community, will be held at the Royal Institute of Natural Sciences in Brussels.

This is SUPER EXCITING to me because as a small child I spent many hours drooling in front of the Institute Museum's stunningly beautiful Iguanodons, likely leaving grubby handprints all over the glass cases, to the shame and annoyance of my parents. I also happen to have attended the Lycee Emile Jacqmain which is located in the same park, right next to the Museum (also within a stone's throw of the more recently added European Parliament) so for me this is a real trip into the past. Complete with dinosaurs!

That said, I expect you may find this workshop exciting for very different reasons, such as learning how the GATK can empower your research and hearing about the latest cutting-edge developments that you can expect for version 3.2.

See this website or the attached flyer for practical details (but note that the exact daily program may be slightly different than announced due to the latest changes in GATK) and be sure to register (it's required for admission!) by emailing cvangestel at with your name and affiliation.

Please note that the hands-on sessions (to be held on the third day) are already filled to capacity. The tutorial materials will be available on our website in the days following the workshop.

Important notice for GATK versions 2.6 and older Jun 3

In a nutshell: if you're using a version of GATK older than 2.7, you need to request a key to disable Phone Home (if you don't already have one). See below for a full explanation.


Taking a break over Memorial Day weekend May 23

Since this coming Monday (May 26th) is a national holiday in the U.S., we get the day off from obsessing about GATK (some of us under pressure from our significant others). This does mean that starting later today, there won't be anyone from the team available to answer questions or comments on the forum until Tuesday.

If you're in the U.S., we hope you get to take the day off too, and have a great holiday weekend! Go hiking or something -- anything to get away from the keyboard and get some fresh air. And if you will, give a thought to the men and women that the holiday commemorates, who gave their lives for the freedoms we enjoy. In some cases, more literally than you may think. From the Wikipedia article on Memorial Day:

The first widely publicized observance of a Memorial Day-type observance after the Civil War was in Charleston, South Carolina, on May 1, 1865. During the war, Union soldiers who were prisoners of war had been held at the Charleston Race Course; at least 257 Union prisoners died there and were hastily buried in unmarked graves.[13] Together with teachers and missionaries, black residents of Charleston organized a May Day ceremony in 1865, which was covered by the New York Tribune and other national papers. The freedmen cleaned up and landscaped the burial ground, building an enclosure and an arch labeled, "Martyrs of the Race Course." Nearly ten thousand people, mostly freedmen, gathered on May 1 to commemorate the war dead. Involved were about 3,000 school children newly enrolled in freedmen's schools, mutual aid societies, Union troops, black ministers, and white northern missionaries. Most brought flowers to lay on the burial field. Today the site is used as Hampton Park.[14] Years later, the celebration would come to be called the "First Decoration Day" in the North.

Developer interview: Mauricio Carneiro on the raison d'être of the GATK Apr 16

Our very own Mauricio Carneiro (@Carneiro on the forum) recently gave an interview in which he explained the origins of the GATK and the motivations of the development team. You can watch it here:

New job opening: help us write papers about GATK! Apr 14

We have a problem: we have a truckload of material sitting around waiting to be published, but no time to actually write the papers. So we're looking for someone who will help us convert this computational biology goldmine into cold hard Nature Biotech/Methods papers.

This is a great opportunity for an early-career, postdoc-level scientist who has experience publishing papers, demonstrated writing ability, and is not afraid of wrangling complex technical material.

Make no mistake, we're not looking for a ghostwriter; this will involve intellectual contribution worth the authorship in high-profile publications. But the basic material is ready and waiting.

Here is the complete job description; feel free to ask questions in the comments or by private message to me (Geraldine):


