User guide

Scripture privides three main operations or tasks:

 

Make paired file task

Make paired file task

Generates a paired end alignment file by using two sets of independently aligned left and right ends. 

Command:  java -Xmx2000m jar scripture.jar -task makePairedFile <Mandatory parameters> <Options> 
Mandatory Parameters

 -pair1: Path to one set of independently aligned left and right ends.  The alignment must be in SAM format and, ideally, should be sorted by read name. 

 -pair2: Path to the other set of independently aligned left and right ends.  The alignment must be in SAM format and, ideally, should be sorted by read name. 

 -out: Path to a file in which Scripture writes the paired end alignment file.  The output is an alignment in SAM format. 

Options

-sorted: This flag must be included if the two sets of independently aligned left and right ends are already sorted by read name.  Ideally, they should be. 

Segmentation task

Segmentation task

Scripture's main algorithm to "segment" the genome from the sequence data into regions enriched in read coverage takes as input a read alignment file, genome information and filtering parameters to produce a transcript graph. Requires about 2 GB of memory for the data in the paper.

Command: java -Xmx2000m jar scripture.jar <Mandatory parameters> <optional parameter>
Mandatory Parameters

 -alignment: Path to the a spliced read alignment file. In this first version only one alignment is supported, so various sequencing lanes must be combined before invoking scripture. Alignments must be in BAM or SAM format and need to be both sorted and indexed. To sort and index we recommend to use igvtools (for SAM) and samtools (from BAM).

-out: Path to a file for Scripture to write its output. The output of Scripture is a BED file format containing all identified transcripts, additionally scripture also outputs a full graph file containing all segments found in the data (significant or not) to a file named after the value specified by this parameter but with an extra .dot extension. The format of the graph file is DOT. This file is self contained and can be used to estimate expression or further processing (e.g. add paired data if non was used when this task was ran). See extract task for information on how to view information in this file.

-sizeFile: A 2-column tab separated file containing the chromosome name and size for the organism.

-chr: Chromosome to segment (in this version Scripture calls transcripts one chromsome at a time).

-chrSequence: Full path to the chromosome sequence in fasta format for the chromosome to segment.

Optional Parameters

-start: Start of region to segment if not segmenting the full chromosome.

-end: End of region to segment when not segmenting the full chromosome.

-windows: Comma separated list of fixed size windows to scan. By default Scripture identifies regions of uninterrupted coverage and uses this regions to segment the data. However, in some cases it is usefule to specify alternative or multiple window sizes.

-alpha: Desired genome-wide significance level, the default is 0.05.

-pairedEnd: Paired end data. This file can be in either SAM, BAM format and should contain the full insert  (from the end of the first pair to the beginning of the second pair) as it maps to the genome.

-upWeightSplices: Spliced regions are less common than reads that map without a splice, by requiring that these reads be flanked by splice sites their random or background distribution is likely different than reads that map contiguously. When this flag is present, spliced reads are given more weight when computing coverage. Use this flag to increase sensitivity to discover transcripts.

 

Add pairs task

Add pairs task

Process a previously segmented data by joining or breaking connected graphs using paired end data information. 

Command: java -Xmx2000m jar scripture.jar -task addpairs <Mandatory parameters>
Mandatory Parameters

-in: A graph output from a regular transcript run (segmentation task).

-out: Path to a file for Scripture to write its output. The output of Scripture is a BED file format containing all identified transcripts, additionally scripture also outputs a full graph file containing all segments found in the data (significant or not) to a file named after the value specified by this parameter but with an extra .dot extension. The format of the graph file is DOT. This file is self contained and can be used to estimate expression or further processing (e.g. add paired data if non was used when this task was ran). See extract task for information on how to view information in this file.

-sizeFile: A 2-column tab separated file containing the chromosome name and size for the organism.

-pairedEnd: Paired end data. This file can be in either SAM, BAM format and should contain the full insert  (from the start of the first pair to the end of the second pair) as it maps to the genome.

Score task

Score task

Uses alignment data to compute expression statistics for a given set of transcripts.

Command: java -Xmx2000m -jar scripture.jar -task score <Mandatory parameters>
Mandatory Parameters

-in: A transcript file in BED format.

-alignment: Path to the a spliced read alignment file. In this first version only one alignment is supported, so various sequencing lanes must be combined before invoking scripture. Alignments must be in BAM or SAM format and need to be both sorted and indexed. To sort and index we recommend to use igvtools (for SAM) and samtools (from BAM).

-sizeFile: A 2-column tab separated file containing the chromosome name and size for the organism.

-out: Output file name. The output contains a row per each of the records in the original file with 7 extra columns:

FWER corrected p-value for the observed read count across the transcript

Enrichment (i.e. observed # of reads / expected numbers of reads for transcript length) total number of reads across transcript

total reads across transcript

mean reads/base

RPKM: number of reads which map per kilobase of exon model per million mapped reads for each gene, for each chromosome

local lambda, that is   reads/base accross transcript genomic loci rather than spliced transcript

transcript length.

-sizeFile: A 2-column tab separated file containing the chromosome name and size for the organism.