Machine learning, deep learning, and AI: Oh my!
Artificial intelligence. Machine learning. Deep learning. Neural networks. As high powered data science techniques become more deeply embedded in genomics and biology, researchers have had to start folding a whole new vocabulary into the language describing their work.
On the GATK blog, Broad Data Sciences Platform associate director for outreach & communications Geraldine Van der Auwera provides a brief primer on terminology data scientists use when talking about ways to train computers to carry out complex analytical tasks:
At a high level, data science is the overall discipline that deals among other things with building models in order to make statements and predictions about the data and what it represents. Within that context, machine learning and statistics can be seen as two subfields of data science, utilizing similar tools but with different goals and strategies….
So where do artificial intelligence and deep learning fit in? ...[A]rtificial intelligence, i.e. the ability of machines to make smart decisions without being given step-by-step instructions from a human, is the end goal of all machine learning. Meanwhile, deep learning is a subfield of machine learning that uses techniques based on "neural nets," a type of algorithm that mimics neural pathways in animal brains. Deep learning has been around for a long time, but until recently, neural nets were too computationally intensive to tackle anything more than toy problems. Now, thanks to recent technological developments they can tackle much bigger problems, and have become intensely popular as a way to pursue artificial intelligence.
Van der Auwera goes on to outline the GATK team's past and future efforts to integrate machine learning approaches into their flagship genome analysis product:
Classic GATK machine learning methods that have been around since the early days of GATK include base recalibration (BQSR) and variant recalibration (VQSR). … [A]fter exploring various alternatives over the years, we have finally nailed down a new approach based on deep learning that we expect will replace VQSR in our Best Practices pipeline within the coming months.
This new deep learning based approach uses two-dimensional convolutional neural nets (2D CNN) to classify variant candidates coming out of the variant calling pipeline, with the intent of making it a drop-in replacement for VQSR. The tools involved are still in beta-stage development (publicly accessible in GATK4 but not yet "blessed" for production use), but in our tests the new method outperforms VQSR significantly, delivering greater precision without reducing sensitivity.
Visit the GATK blog for more.