FAQ

The following FAQ's are related to GATK PathSeq

How do I run GATK PathSeq?

New users should work through the basic tutorial on how to get started running the pipeline. This article also provides instructions for preparing custom host and microbe references for use with PathSeq.

What platforms are supported?

PathSeq is built using GATK4 for MacOS and Linux operating systems. The GATK4 is integrated with the Apache Spark data processing engine for parallelization on multi-core machines and clusters. Click here for more information on Spark and the GATK.

Can PathSeq be run in the cloud?

Click here for information on running GATK tools on Google Compute engine VMs.

PathSeq can also be run on Firecloud, a web portal for running production-scale analyses (see here). Registered users should import the latest "pathseq-pipeline" snapshot and configuration from the Method Repository into their workspace.

Is a Workflow Description Language (WDL) script available?

Yes, a WDL script can be found in /scripts/pathseq/wdl in the GATK source repo.

What hardware is needed to run PathSeq?

A multi-core machine with at least 200GB of RAM is recommended for running PathSeq using the reference files in the GATK Resource Bundle (see Downloads section).

Is documentation available?

Detailed tool documentation can be found under the Metagenomics category in the GATK Documentation here.

I'm encountering an error or need assistance, where can I go to get help?

Users should first check the GATK User Guide to see if a solution is already available. If not, please post questions to the GATK Support Forum for assistance. Please do not submit questions through the github repository unless asked to do so by GATK support staff.