PathSeq is a computational tool for the identification and analysis of microbial sequences in high-throughput human sequencing data that is designed to work with large numbers of sequencing reads in a scalable manner. This process is composed of a subtractive phase in which input reads are subtracted by alignment to human reference sequences, and an analytic phase in which the remaining reads are aligned to microbial reference sequences (viral, fungal, bacterial, archaeal) and de novo assembled.
PathSeq is currently available in a cloud computing environment via Amazon Web Services
The following figure illustrates the typical approach one would take to pathogen discovery with PathSeq. RNA or DNA is extracted from the tissue of interest and sequencing libraries are constructed to be run on the next-generation DNA sequencing platform of choice. The resulting sequence data is run through the PathSeq pipeline in a cloud computing environment. PathSeq reports potential microbes in the sequence data as well as the complete set of reads that could not be identified as human or microbial sequences.
The following figure illustrates the schematic of PathSeq on the Amazon Elastic Compute Cloud. Pre-processed sequence data is uploaded onto a cluster of n nodes on the cloud. After completion of the PathSeq pipeline, the output is reported to the user.