How do I run BreakPointer?

Currently BreakPointer runs on Unix based machines. Requires BWA, Samtools and Matlab or MCR installed.

 

Usage

BreakPointer.sh  matlab_dir  sample_name  rearrangement_predictions  bam_file  lane_black_list  refdir  insertionsize  confidence_thres  low_confidence_sidewithread  low_confidence_sidewithread  high_confidence_sidewithread  high_confidence_sidewithoutread  max_mismatches  tipsize  min_mismatches_in_tip  max_N  expand_pairs_extraction  max_reads_fished  readlen  align_enough_reads  split_penalty  min_qual  libdir  first_rearrangement  last_rearrangement

 

Some of the key parameters to the algorithm are described in detail in our publication.

Argument Description Suggested   Value
matlab_dir The directory where Matlab or MCR is installed.  
sample_name The name of the sample, used for setting the output file names. Can be any string valid for a file name
rearrangement_predictions Input text file with the predicted rearrangements estimation (by PEM). It is a tab delimited text file with a header row, containing 'chr1', 'chr2', 'pos1', 'pos2', 'str1','str2' columns which represent the predicated rearrangement (chromosome, position, and strand for the two breakpoints) and a ‘tumreads’ column which represent how many discordant read pairs support the prediction.  
bam_file Is the BAM file of the sample. The folder containing it should also contain an index.  
lane_black_list Is a text file listing lanes that you wish to omit. If you don't have any you can specify "none" (suggested). none
refdir Is a directory containing the genome by text files named chr1.txt, chr2.txt, chr3.txt, ......, chrX.txt with the content of each chromosome as one single line.  
insertionsize Average insertion size, or a range of likely insertion sizes (in bp). 400
confidence_thres The number of supporting discordant pairs reads above which the highconfidence parameters (below) are used. 6
low_confidence_sidewithread The estimated error in the loci prediction before the breakpoint, that is the side the supporting reads are on, when number of supporting discordant pairs is low. 80
low_confidence_sidewithoutread The estimated error in the loci prediction after the breakpoint, when number of supporting discordant pairs is low. 200

high_confidence_sidewithread

The estimated error in the loci prediction before the breakpoint, that is the side the supporting reads are on, when number of supporting discordant pairs is high. 40
high_confidence_sidewithoutread The estimated error in the loci prediction after the breakpoint, when number of supporting discordant pairs is high. 50
max_mismatches The maximal percentage of mismatches allowed for a read to be considered partly aligned (and hence possibly spanning the breakpoint). 30 (for MAQ), 80 (for BWA)
tipsize The size (in bp) of the tip of the read to check partial alignment. 7 (for MAQ), 15 (for BWA)
min_mismatches_in_tip The minimal number of mismatches required in a tip so it will be considered a mismatched tip, and hence the read is a candidate for partly aligned read (and hence possibly spanning the breakpoint). 2 (for MAQ), 5 (for BWA)
max_N The maximal number of N allowed in a read for it to be considered partly aligned (and hence possibly spanning the breakpoint). 10
expand_pairs_extraction Window size around predicted breakpoint to look for split reads. 500
max_reads_fished Maximal number of reads to pull when looking for split reads. 100000
readlen The read length (in bp).  
align_enough_reads BreakPointer quits aligning after this number of split reads that support on the same breakpoint are found. 20
split_penalty The penalty to BreakPointer score for splitting a read. 8
min_qual The minimal score a split read needs to be considered (normalized by read length). 0.75
libdir   The path to a directory with GrabSplitReads.jar, align_each_bkpt5.bin, and links to BWA and Samtools.  
first_rearrangement   The first prediction to pinpoint. 1
last_rearrangement The lat prediction to pinpoint.