Mauricio Carneiro presented this slide deck at the workshop organized by Mnt Sinai School of Medicine on December 10, 2013. The other presentations made at the workshop were posted here.
Please note that we cannot guarantee content hosted on other websites; if outgoing links becomes outdated please let us know.
For those of you who are in San Diego this week for ASHG 2014, here are a few pointers to GATK-related opportunities: two workshops, one open demo session, a poster on RNAseq analysis (today!) and of course the Appistry booth. Ami Levy-Moonshine (@ami on the forum), who is our resident RNAseq expert, will be representing the GATK team throughout the conference. Details below the fold in somewhat chronological order.
Ami is presenting poster #1054S today (Sunday) from 5-6 pm (Convention Center, Exhibit Hall, Ground Level). He will cover the GATK Best Practices for RNAseq analysis, as well as an application of the workflow to a study of expression levels in autistic brains done by Alal Eran and collaborators.
We are part of a network of developers of genomic analysis tools called iSeqTools sponsored by NHGRI. Representatives of each group in the network (including our own Ami Levy-Moonshine) will be teaching a workshop on Monday (12:30-2pm, room 24/upper level). The workshop will present the network and demonstrate how the tools can be leveraged to perform genome analysis on the cloud.
Registration is required and the workshop is sold out, but there will be a hands-on demo session in the evening in the same room from 6:30-8pm that will be free and open to all comers. Please feel free to come ask questions!
Our licensing partner, Appistry, will be present in the exhibits hall. Be sure to swing by if you have any questions about licenses for commercial use of GATK, commercial support and so on. Ami will be at the Appistry booth Monday 3-4pm to answer questions (open to all, not just commercial customers).
Ami will be giving a super-condensed 90-minute version of our popular workshop series on the GATK Best Practices on Tuesday (12:30-2pm, room 24/upper level). Again, registration is required and the workshop is sold out, but we'll post the slides online, and this can be an opportunity to meet up with Ami after the workshop.
This workshop was organized with the help of Ben Neale, who will be present to introduce the workshop. Ben just got back from the World Congress of Psychiatric Genetics conference in Copenhagen where received an award from the International Society of Psychiatric Genetics for his exceptional published work on psychiatric genetics, so we're very excited to have him in our corner.
The slides from the talk I gave at the Genome Science UK conference (Oxford, Sept 2) are now available here.
A few impressions below the fold.
As always it was fun meeting plenty of GATK users and other researchers in general, and very exciting to get to spread the word about our new workflow and the HaplotypeCaller's capabilities. Special thanks to Mick Watson from Edinburgh Genomics for inviting me, and his team for making me feel super welcome.
I really enjoyed getting to see quite a few microbial genomics talks, since I am originally a microbiologist by training. Too bad I had to miss the plant/forestry B session, as I'm very curious about the crazy-ploidy aspects of plant genomics, but that's the curse of parallel sessions I suppose. Very interesting conference overall for sure, and a nice group size -- lively but not overwhelming (I dislike mega-cons like ASHG). Though maybe next time Nick Loman should get the main lecture theater for his MinION talk instead of the little basement room -- and someone should make sure the wifi network can handle dozens of data-crazed scientists trying to download his MinION datasets at the same time.
A final note on the live-tweeting, i.e. people in the audience tweeting snippets during the talk. I was aware of this as a trend, and in fact I've followed tweet-streams of other people's talks, but had never experienced it myself as a speaker. It's a bit nerve-inducing but very interesting as an insight into what (at least some of) the audience reacted most strongly to and took away from the talk. It also stimulated some interesting follow-up exchanges, so I'll tentatively classify it as a Good Thing for now.
Here's my abstract for the upcoming Genome Science UK meeting in Oxford, where I'll be talking about our hot new workflow for variant discovery. The slide deck will be posted in the Presentations section as usual after the conference.
Variant discovery is greatly empowered by the ability to analyse large cohorts of samples rather than single samples taken in isolation, but doing so presents considerable challenges. Variant callers that operate per-locus (such as Samtools and GATK’s UnifiedGenotyper) can handle fairly large cohorts (thousands of samples) and produce good results for SNPs, but they perform poorly on indels. More recently developed callers that operate using assembly graphs (such as Platypus and GATK’s HaplotypeCaller) perform much better on indels, but their runtime and computational requirements tend to increase exponentially with cohort size, limiting their application to cohorts of hundreds at most. In addition, traditional multisample calling workflows suffer from the so-called “N+1 problem”, where full cohort analysis must be repeated each time new samples are added.
To overcome these challenges, we developed an innovative workflow that decouples the two steps in the multisample variant discovery process: identifying evidence of variation in each sample, and interpreting that evidence in light of the evidence gathered for the entire cohort. Only the second step needs to be done jointly on all samples, while the first step can be done just as well (and much faster) on one sample at a time. This decoupling hinges on the use of a novel method for reference confidence estimation that produces a genomic VCF (gVCF) intermediate for each sample.
The new workflow enables fast, highly accurate and computationally cheap variant discovery in cohort sizes that were previously intractable: it has already been applied successful to a cohort of nearly one hundred thousand samples. This replaces previous brute-force approaches and lowers the threshold of accessibility of sophisticated cohort analysis methods for all, including researchers who do not have access to large amounts of computing power.
Ladies, gentlemen and everyone else (this is a judgment-free zone), it's officially summertime in the norther hemisphere. Depending on who and where you are, this can mean no more classes, no more exams, and more quality time with your loved ones -- or extra expense getting someone to keep your offspring out of your way (hello summer camp). It is that hallowed time of year when academics put down the burdens of teaching and administrative duties and can finally get some science done. For many, it also means conference season, e.g. meeting up in Spain with a bunch of colleagues to argue about obscure methodological details over many a glass of tinto de verano. It's a hard, hard life.
A group of us just got back from sunny Belgium* where we held a GATK workshop at the invitation of the Royal Institute for Natural Sciences in Brussels. Now we're looking ahead to the next big dates on the horizon, and I thought I'd share them here in case some of you can join us. Or in case you would like to invite us over to give talks or workshops... (seriously, private-message me if you're interested in hosting a GATK workshop).
* This is not irony, it was really beautiful the whole time. Until the two non-Belgians left, and then boom! Downpour for three days. Typical.
As you can see below, our dance card is all clear for the summer itself but starting September it gets pretty busy.
I will be attending the Genome Science meeting in Oxford, UK, and giving a talk in the Bioinformatics Infrastructure session.
Mauricio Carneiro and David Roazen will be attending this C++ development conference, and Mauricio will be presenting a talk titled "Gamgee: A C++14 library for genomics data processing and analysis". What this means for GATK-based development, well... happy speculating :)
The Center for Genetics and Complex Traits (CGACT) and the Institute for Biomedical Informatics (IBI) of the University of Pennsylvania Perelman School of Medicine are hosting our workshop. The workshop crew will consist of Eric Banks, Sheila Chandran and myself. See this announcement for more details.
Ami Levy-Moonshine will be attending the ASHG meeting in San Diego, California, and giving a compressed version of our Best Practices workshop on Tuesday 10/21 (separate registration required). Ami will also represent us in the iSeqTools workshop on cloud-based analysis on Monday10/20.
Due to scheduling constraints, we were unable to make this workshop happen, but we have tentative dates for March 2015 (3/19-3/20).
This list will be updated with any new events up to December 2014.
Alright, the next release is going to be version 3.0. So what's in it??
We'll have a full overview ready for you in the next few days.
In the meantime, if you're at AGBT-2014 working on your tan (lucky devil), one way to find out is to go see Mauricio Carneiro's poster during the Thursday afternoon poster session, and ask him all about it (you're welcome, MC -- we know you like the attention).
If you're not, here's a copy of Mauricio's poster, which features three of the top features in GATK 3.0. Because why should you miss out, on top of having to shovel snow all over again tomorrow (or whatever the applicable chore is in your neck of the woods) instead of drinking margaritas by the pool?
In case you're wondering, the owl is the mascot of the nightly builds. He/she (we're not sure; we respect its privacy) builds a fresh copy of the GATK every night with the day's new developments that made it into the master branch.