User manual
Step 1: Configure the job.config in $LOCAL_DIRECTORY/PathSeq/
This file is configured for every job with the job related information such as job name, Illumina sequening reads location on the cloud, and so on.
-
############################################################
#JOB CONFIG
############################################################
# Name of the cluster (Sample: test-cluster)
NAMECLUSTER=xxxxxxxxxx
# Cluster size (By default Amazon offers 19 nodes + 1 master node)(Sample: 19)
CLUSSIZE=xxx
# Name of the Job (Sample: testjob)
NAMEJOB=xxxxxxxx
# S3 Bucket name for Reference Genomes in the Cloud (Sample: ami-ref)
REFBUCKET=xxxxxxxx
# Local directory to download the reference genomes (Sample: test)
LOCALREFDIR=xxxxxxx
# S3 Bucket name for Illumina sequencing reads in the Cloud (Sample: ami-testreads)
READSBUCKET=xxxxxxxx
#PathSeq remote script in zip file (Don't Edit)
PATHCLOUD=pathseq.zip
# Type of data (RNASEQ / WGS)
#RNASEQ includes mRNASeq, TotalRNASEQ
DATATYPE=RNASEQ
############################################################
- Go to ./$LOCAL_DIRECTORY/PathSeq/
- Run Upload_Reference_Genome.com script
Note: The time taken to complete this Step depends on your internet speed.
Step 3: Filter the low quality reads from the illumina sequencing library (in FASTQ format)
- Go to ./$LOCAL_DIRECTORY/PathSeq/
- Run QualityFilter.com (ARGUMNET 1 is Illumina sequencing library in FASTQ format-forward reads) (ARGUMNET 2 is Illunmina sequencing library in FASTQ format - reverse reads (optional))
Note:
1. The time taken to complete this Step depends on your size of library.
2. The parameters used in Quality filtering are: Quality threshold : 15, Number of bases not meeting Quality threshold: 3, Quality offset score: 33 (Please check with your sequencing center)
3. You can change these quality filtering parameters by editing the QualityFilter.com script using Vi editor
Step 4: Pre-process the input illumina sequencing library (in FQ1 format) and upload the pre-processed reads into S3 bucket (defined in job.config file)
- Go to ./$LOCAL_DIRECTORY/PathSeq/
- Run Preprocessed_Reads.com (ARGUMNET is Illumina sequencing library in FQ1 format)
Note: The time taken to complete this Step depends on your internet speed and number of reads.
Step 5: Deploy the job in the Cloud
- Go to ./$LOCAL_DIRECTORY/PathSeq/
- Run Pathseq_launch.com script
Note: The time taken to complete this Step depends on your internet speed and number of reads.
Step 6: After successful completion of PathSeq run, manually download the output file as Output.tar from s3://ami-
s3cmd get -r s3://ami-
Step 7: After successful completion of PathSeq run and downloading the output file, terminate the cluster
./$LOCAL_DIRECTORY/PathSeq/Terminate_Pathseq.com