Time to light the FireCloud

The Broad Cancer Genome Analysis group and Data Science Platform's suite of TCGA data and cancer genome analysis tools is open for evaluation. For several years, the Broad's Cancer Genome Analysis (CGA) group and Data Science Platform (DSP) have been working to reimagine Firehose (a Broad-built...

The Broad Cancer Genome Analysis group and Data Science Platform's suite of TCGA data and cancer genome analysis tools is open for evaluation.

 

For several years, the Broad's Cancer Genome Analysis (CGA) group and Data Science Platform (DSP) have been working to reimagine Firehose (a Broad-built, National Cancer Institute-funded cancer data warehouse and analysis pipeline), always with one eye looking upward. The fruit of their labors is FireCloud, a cloud-based, open-source, comprehensive suite of tumor data and analysis tools that offers a ready way for genomics researchers to:

1. Securely access curated public and controlled data sets from The Cancer Genome Atlas (TCGA)
2. Make use of baked-in best practice analysis pipelines and workflows, such as ContEst, MuTect, and Oncotator
3. Upload their own data to the cloud for analysis
4. Share workspaces with collaborators, with fine-grained control over who can see which data

More than 500 users have utilized Firehose since the CGA group launched it in 2009. By re-engineering Firehose from the ground up and plugging into Google's cloud platform, the CGA group aims to take local data storage and computing capacity limitations out of the equation for users.

"The National Cancer Institute has a vision to make all 2.5 petabytes of TCGA data available to researchers worldwide, but few centers or laboratories have the storage and resources to really dive in and make use of a data set this size," says Broad CGA director and institute member Gad Getz. "We think FireCloud will democratize access to these data and tools by leveraging a cloud-based, secure, collaborative, and scalable environment."

This spring has been a busy time for Broad and cloud computing. In April, the institute collaborated with Amazon Web Services, Cloudera, Google, IBM, Intel, and Microsoft on plans to enable cloud access to the Genome Analysis Toolkit (GATK) software package. GATK is a package of industry standard tools for SNP and indel detection in DNA or RNA data from any organism. FireCloud, on the other hand, is a platform that offers access to tools focused specifically on using genomic data to gain scientific insights into cancer.

Having passed initial internal beta testing, FireCloud is now in a five-month evaluation phase. Members of the Broad community can register for and access FireCloud using their Broad credentials. During this phase the team is working with the National Institutes of Health to set the stage to open FireCloud up to the wider research community.

Keep up with FireCloud on Twitter at @BroadFireCloud.

UPDATE 5/31/16: FireCloud is now open to the general public! Register to use FireCloud by visiting firecloud.org and clicking "Use FireCloud."