You are here

2019 Workshops

ENCODE data utilization workshop
Welcome to the ENCODE Encyclopedia of DNA Elements! The goal of this workshop is to teach you how to search, analyze, and visualize ENCODE data.

The ENCODE project is building a comprehensive parts list of functional elements in the human genome, and mapping the regulatory mechanisms that control gene expression. ENCODE consists of ~15,000 curated  datasets, and continues to grow.

In this workshop, you’ll learn how to search, analyze, and visualize ENCODE data. First, leaders from the ENCODE data coordination center at Stanford (DCC), and data analysis center at UMASS (DAC) will teach you how to access and work with data via both the ENCODE portal and SCREEN. Second, Noam and Sushma from Broad Epigenomics and the DSP Terra team will teach you how to access and analyze ENCODE data in the cloud, via both the ENCODE Data Coordination Center (DCC)/AWS and Terra/Google Cloud. Finally, you’ll learn from Neva in the Aiden lab how to identify and visualize 3D genome interactions using ENCODE data in Juicebox. Representatives from the ENCODE DCC will also be on hand all day to answer your questions and help troubleshoot. This will be a hands on workshop, but no prior experience is necessary as there will be demos and experts on hand to teach at different levels. This is a whole-day workshop, lunch will be provided.

May 14
Image analysis: How to make an image worth more than a thousand words
Image processing plays a vital role in modern biomedical research. As the capacity to acquire and analyze images continues to grow, so too does CellProfiler, an open-source, freely-downloadable software designed for large-scale, automated phenotypic image analysis. Attendees will have a hands-on practice on CellProfiler as well as brief introduction on related image analysis softwares, followed by case-studies on HCS and cell- type classification.  At the end of the workshop, there will be a breakout session where attendees will receive guidance on analyzing their own image data. If you are curious about automating the analysis of your microscopy data or want to become familiar with "what's possible", come to the workshop and see what's new in CellProfiler v3.
April 3

Best Practices for Variant Calling with the Genome Analysis Toolkit
The workshop focuses on the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit (GATK), using the “Best Practices” developed by the GATK team. Participants will learn why each step is essential to the variant discovery process, what are the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of their dataset.

This workshop focuses on calling germline short variants and somatic short variants and copy number alterations with Broad's Genome Analysis Toolkit (GATK), using best practices developed by the DSP Methods development team, who develop GATK. The developers will give talks explaining the rationale, theory, and real-world applications of the GATK Best Practices. You'll learn why each step is essential to the variant-calling process, what key operations are performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset. If you're an experienced GATK user, you'll gain a deeper understanding of how the GATK works under the hood and how to improve your results further, especially with respect to the latest innovations. The hands-on GATK tutorials in this workshop will be conducted on Terra (, a new platform developed at Broad in collaboration with Verily Life Sciences for accessing data, running analysis tools and collaborating securely and seamlessly. (If you’ve heard of or been a user of FireCloud, think of Terra as the new and improved user interface for FireCloud that makes doing research easier than before!)

All accepted registrants will attend the first day of the workshop. You may select subsequent days à la carte.

Day 1: Thursday, March 21, 2019 — Introductory topics and hands-on tutorials (Required).
We'll start off with introductory lectures on sequencing data, preprocessing, variant discovery, and pipelining. Then you'll get hands-on with a recreation of a real variant discovery analysis in Terra.

Day 2: Friday, March 22, 2019 — Germline short variant discovery (Optional).
Through a combination of lectures and hands-on tutorials, you'll learn: germline single nucleotide variants and indels, joint calling, variant filtering, genotype refinement, and callset evaluation.

Day 3: Tuesday, March 26, 2019 — Somatic variant discovery (Optional).
In a format similar to Day 2, you'll learn: somatic single nucleotide variants and indels, Mutect2, and somatic copy number alterations.

Day 4: Wednesday, March 27, 2019 — Pipelining and additional skills for working in Terra (Optional).
You'll learn how to write your own pipelining scripts in the Workflow Description Language (WDL) and execute them with the Cromwell workflow management system. We will also cover additional skills to help you do an end-to-end analysis in Terra.

March 21-22, 26-27
Scale with Hail 0.2: A hands-on tutorial for genomic analysis

This hands-on Hail 0.2 tutorial will be led by members of the Hail team, based in the Neale lab in the Stanley Center.  

Hail is an open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data. Similar to the R or Python scientific computing stacks, Hail supports data frame queries, statistics, linear algebra, and plotting, both interactively and with scripts. Unlike these stacks, Hail:

- scales from laptop to large compute cluster or cloud, with the same code
- is designed to work with datasets that do not fit in memory
- has first-class support for multi-dimensional structured data, like genomic data

At Broad, Hail is the analytical engine behinds dozens of studies, the Genome Aggregation Database (, and the Neale lab mega-GWAS ( Beyond Broad, Hail is used by academia and industry on data ranging from mouse models to GTEx.

Target audience: Scientists analyzing genomic variant datasets and their relationship to phenotypes, gene expression, or other data. Participants may pre-install Hail or access it via the cloud in their browser. 


Jan. 25