For those of you who are in San Diego this week for ASHG 2014, here are a few pointers to GATK-related opportunities: two workshops, one open demo session, a poster on RNAseq analysis (today!) and of course the Appistry booth. Ami Levy-Moonshine (@ami on the forum), who is our resident RNAseq expert, will be representing the GATK team throughout the conference. Details below the fold in somewhat chronological order.
Ami is presenting poster #1054S today (Sunday) from 5-6 pm (Convention Center, Exhibit Hall, Ground Level). He will cover the GATK Best Practices for RNAseq analysis, as well as an application of the workflow to a study of expression levels in autistic brains done by Alal Eran and collaborators.
We are part of a network of developers of genomic analysis tools called iSeqTools sponsored by NHGRI. Representatives of each group in the network (including our own Ami Levy-Moonshine) will be teaching a workshop on Monday (12:30-2pm, room 24/upper level). The workshop will present the network and demonstrate how the tools can be leveraged to perform genome analysis on the cloud.
Registration is required and the workshop is sold out, but there will be a hands-on demo session in the evening in the same room from 6:30-8pm that will be free and open to all comers. Please feel free to come ask questions!
Our licensing partner, Appistry, will be present in the exhibits hall. Be sure to swing by if you have any questions about licenses for commercial use of GATK, commercial support and so on. Ami will be at the Appistry booth Monday 3-4pm to answer questions (open to all, not just commercial customers).
Ami will be giving a super-condensed 90-minute version of our popular workshop series on the GATK Best Practices on Tuesday (12:30-2pm, room 24/upper level). Again, registration is required and the workshop is sold out, but we'll post the slides online, and this can be an opportunity to meet up with Ami after the workshop.
This workshop was organized with the help of Ben Neale, who will be present to introduce the workshop. Ben just got back from the World Congress of Psychiatric Genetics conference in Copenhagen where received an award from the International Society of Psychiatric Genetics for his exceptional published work on psychiatric genetics, so we're very excited to have him in our corner.
Here's my abstract for the upcoming Genome Science UK meeting in Oxford, where I'll be talking about our hot new workflow for variant discovery. The slide deck will be posted in the Presentations section as usual after the conference.
Variant discovery is greatly empowered by the ability to analyse large cohorts of samples rather than single samples taken in isolation, but doing so presents considerable challenges. Variant callers that operate per-locus (such as Samtools and GATK’s UnifiedGenotyper) can handle fairly large cohorts (thousands of samples) and produce good results for SNPs, but they perform poorly on indels. More recently developed callers that operate using assembly graphs (such as Platypus and GATK’s HaplotypeCaller) perform much better on indels, but their runtime and computational requirements tend to increase exponentially with cohort size, limiting their application to cohorts of hundreds at most. In addition, traditional multisample calling workflows suffer from the so-called “N+1 problem”, where full cohort analysis must be repeated each time new samples are added.
To overcome these challenges, we developed an innovative workflow that decouples the two steps in the multisample variant discovery process: identifying evidence of variation in each sample, and interpreting that evidence in light of the evidence gathered for the entire cohort. Only the second step needs to be done jointly on all samples, while the first step can be done just as well (and much faster) on one sample at a time. This decoupling hinges on the use of a novel method for reference confidence estimation that produces a genomic VCF (gVCF) intermediate for each sample.
The new workflow enables fast, highly accurate and computationally cheap variant discovery in cohort sizes that were previously intractable: it has already been applied successful to a cohort of nearly one hundred thousand samples. This replaces previous brute-force approaches and lowers the threshold of accessibility of sophisticated cohort analysis methods for all, including researchers who do not have access to large amounts of computing power.
Ladies, gentlemen and everyone else (this is a judgment-free zone), it's officially summertime in the norther hemisphere. Depending on who and where you are, this can mean no more classes, no more exams, and more quality time with your loved ones -- or extra expense getting someone to keep your offspring out of your way (hello summer camp). It is that hallowed time of year when academics put down the burdens of teaching and administrative duties and can finally get some science done. For many, it also means conference season, e.g. meeting up in Spain with a bunch of colleagues to argue about obscure methodological details over many a glass of tinto de verano. It's a hard, hard life.
A group of us just got back from sunny Belgium* where we held a GATK workshop at the invitation of the Royal Institute for Natural Sciences in Brussels. Now we're looking ahead to the next big dates on the horizon, and I thought I'd share them here in case some of you can join us. Or in case you would like to invite us over to give talks or workshops... (seriously, private-message me if you're interested in hosting a GATK workshop).
* This is not irony, it was really beautiful the whole time. Until the two non-Belgians left, and then boom! Downpour for three days. Typical.
As you can see below, our dance card is all clear for the summer itself but starting September it gets pretty busy.
I will be attending the Genome Science meeting in Oxford, UK, and giving a talk in the Bioinformatics Infrastructure session.
Mauricio Carneiro and David Roazen will be attending this C++ development conference, and Mauricio will be presenting a talk titled "Gamgee: A C++14 library for genomics data processing and analysis". What this means for GATK-based development, well... happy speculating :)
The Center for Genetics and Complex Traits (CGACT) and the Institute for Biomedical Informatics (IBI) of the University of Pennsylvania Perelman School of Medicine are hosting our workshop. The workshop crew will consist of Eric Banks, Sheila Chandran and myself. See this announcement for more details.
Ami Levy-Moonshine will be attending the ASHG meeting in San Diego, California, and giving a compressed version of our Best Practices workshop on Tuesday 10/21 (separate registration required). Ami will also represent us in the iSeqTools workshop on cloud-based analysis on Monday10/20.
Due to scheduling constraints, we were unable to make this workshop happen, but we have tentative dates for March 2015 (3/19-3/20).
This list will be updated with any new events up to December 2014.