User manual

Step 1: Configure the job.config in $LOCAL_DIRECTORY/PathSeq/
This file is configured for every job with the job related information such as job name, Illumina sequening reads location on the cloud, and so on.

    ############################################################
    #JOB CONFIG
    ############################################################
    # Name of the cluster (Sample: test-cluster)
    NAMECLUSTER=xxxxxxxxxx
    # Cluster size (By default Amazon offers 19 nodes + 1 master node)(Sample: 19)
    CLUSSIZE=xxx
    # Name of the Job (Sample: testjob)
    NAMEJOB=xxxxxxxx
    # S3 Bucket name for Reference Genomes in the Cloud (Sample: ami-ref)
    REFBUCKET=xxxxxxxx
    # Local directory to download the reference genomes (Sample: test)
    LOCALREFDIR=xxxxxxx
    # S3 Bucket name for Illumina sequencing reads in the Cloud (Sample: ami-testreads)
    READSBUCKET=xxxxxxxx
    #PathSeq remote script in zip file (Don't Edit)
    PATHCLOUD=pathseq.zip
    # Type of data (RNASEQ / WGS)
    #RNASEQ includes mRNASeq, TotalRNASEQ
    DATATYPE=RNASEQ
    ############################################################
Step 2: Upload your reference genome(s) in fasta / bfa format into the cloud for permanent storage. You will have to do this step only once in every PathSeq setup if you do not delete the reference genomes from the cloud).

         - Go to ./$LOCAL_DIRECTORY/PathSeq/
         - Run Upload_Reference_Genome.com script

Note: The time taken to complete this Step depends on your internet speed.

Step 3: Filter the low quality reads from the illumina sequencing library (in FASTQ format)

         - Go to ./$LOCAL_DIRECTORY/PathSeq/
         - Run QualityFilter.com (ARGUMNET 1 is Illumina sequencing library in FASTQ format-forward reads) (ARGUMNET 2 is Illunmina sequencing library in FASTQ format - reverse reads (optional))

Note:

1. The time taken to complete this Step depends on your size of library.

2. The parameters used in Quality filtering are: Quality threshold : 15, Number of bases not meeting Quality threshold: 3, Quality offset score: 33 (Please check with your sequencing center)

3. You can change these quality filtering parameters by editing the QualityFilter.com script using Vi editor

Step 4: Pre-process the input illumina sequencing library (in FQ1 format) and upload the pre-processed reads into S3 bucket (defined in job.config file)

         - Go to ./$LOCAL_DIRECTORY/PathSeq/
         - Run Preprocessed_Reads.com (ARGUMNET is Illumina sequencing library in FQ1 format)

Note: The time taken to complete this Step depends on your internet speed and number of reads.

Step 5: Deploy the job in the Cloud

         - Go to ./$LOCAL_DIRECTORY/PathSeq/
         - Run Pathseq_launch.com script

Note: The time taken to complete this Step depends on your internet speed and number of reads.

Step 6: After successful completion of PathSeq run, manually download the output file as Output.tar from s3://ami--foutput bucket using S3cmd command
         s3cmd get -r s3://ami--foutput

Step 7: After successful completion of PathSeq run and downloading the output file, terminate the cluster
         ./$LOCAL_DIRECTORY/PathSeq/Terminate_Pathseq.com