How do I run ABSOLUTE?

To run ABSOLUTE, you'll need R version 2.12.0 or higher with the numDeriv package

installed. You can install this with the following command:

install.packages("numDeriv")


If you're using HAPSEG as your input, you will also need to install the HAPSEG package supplied in the release bundle. The subdirectory R_packages contains both source and Windows binary versions. Instructions on how to install these packages can be found at http://cran.r-project.org/doc/manuals/R-admin.html#Installing-packages.

Alternatively you can supply a tab delimited segmentation file (e.g. from array CGH or massively parralel sequencing experiments) - this file must contain the columns "Chromosome", "Start", "End", "Num_Probes" and "Segment_Mean". Your file may contain other columns besides these but as a minimum these columns must be specified. To run in this mode you must also specify the copy_num_type argument as "total".

To run ABSOLUTE first identify your input file (either a HAPSEG segdat or your segmentation file) and issue the following command:

library(ABSOLUTE)
RunAbsolute(seg.dat.fn, sigma.p, max.sigma.h, min.ploidy, max.ploidy, primary.disease, platform, sample.name, results.dir, max.as.seg.count, max.non.clonal, max.neg.genome, copy_num_type, maf.fn=NULL, min.mut.af=NULL, output.fn.base=NULL, verbose=FALSE)


An explanation of the arguments follows:

 

Argument

Description

seg.dat.fn

A filename pointing to the input - either a HAPSEG file or a segmentation file (see above for specification)

sigma.p

Provisional value of excess sample level variance used for mode search

max.sigma.h

Maximum value of excess sample level variance (Eq. 6)

min.ploidy

Minimum ploidy value to consider. Solutions implying lower ploidy values will be discarded

max.ploidy

Maximum ploidy value to consider. Solutions implying greater ploidy values will be discarded

primary.disease

Primary disease of the sample

platform

The chip type used, supported values are currently ‘SNP_250K_STY’, 'SNP_6.0' and 'Illumina_WES'

sample.name

The name of the sample, included in output plots

results.dir

A directory path to place results. If the directory doesn't already exist, it will be created.

max.as.seg.count

Maximum number of allelic segments. Samples with a higher segment count will be flagged as 'failed'

max.neg.genome

Maximum genome fraction that may be modeled as non-clonal with copy-ratio below that of clonal homozygous deletion. Solutions implying greater values will be discarded.

max.non.clonal

Maximum genome fraction that may be modeled as non-clonal (subclonal SCNA). Solutions implying greater values will be discarded.

copy_num_type

The copy number type to assess, can be one of 'allelic' or 'total'. Currently allelic must be used for HAPSEG based inputs and total for segmentation file based inputs.

 

There are four optional, named arguments. To use this, specify them by name e.g. verbose=TRUE.

Argument

Description

maf.fn

If available, a filename pointing to a mutation annotation format (MAF) file. This specifies the data for somatic point mutations to be used by ABSOLUTE.

min.mut.af

If specified, a minimum mutation allelic fraction. Mutations with lower allelic fractions will be filtered out before analysis. Note that if maf.fn is specified, min.mut.af must also be specified.

output.fn.base

If specified, provides a base filename for all output files. The default value is the array.name field in the object pointed to by seg.dat.fn or sample.name if this is not available.

verbose

If you would like verbose output, supply this argument as being TRUE

 

This will supply two output files. In both cases OFB corresponds to either the output.fn.base (if supplied) or the array name.

File

Description

OFB.ABSOLUTE_plot.pdf

Plot showing the Purity/Ploidy values and the solutions

OFB.ABSOLUTE.RData

An R file containing an object seg.dat which provides all of the information used to generate the plot.

 

Multiple ABSOLUTE results can be summarized for an analyst to perform the manual review step using the CreateReviewObject function, which is run as follows:

Argument

Description

obj.name

A descriptive name for this collection of samples

absolute.files

A vector of filenames, each pointing to the RData output for that sample's ABSOLUTE run

indv.results.dir

A directory path to place the results. If this directory does not exist it will be created.

 

There are four optional, named arguments.To use this, specify them by name e.g. verbose=TRUE.

Argument

Description

plot.modes

Set this to FALSE to disable plotting of the purity/ploidy modes

plot.called.only

Set this to TRUE to only plot the called purity/ploidy modes

pp.calls.fn

If plot.called.only is TRUE, this must be defined to point to a file containing ABSOLUTE calls

verbose

If you would like verbose output, supply this argument as being TRUE

 

This call will produce multiple files as output, where obj is the obj.name argument passed in.

File

Description

obj_summary.PP-calls_tab.txt

A tab delimited table detailing the called results

obj_summary.PP-modes.data.RData

A saved object named segobj.list which contains all information used to generate the other output files

obj_summary.PP-modes.plots.pdf

All plot of all of the purity/ploidy modes

sampleName.ABSOLUTE_UNCALLED_PLOT.pdf

A plot for every uncalled results

 

Using these files the analyst will optionally override the solutions provided by createReviewObject.  Tips for manually reviewing ABSOLUTE solutions can be found here.

To override the default solutions from ABSOLUTE, prepend a column to the left of the obj_summary.PP-calls_tab.txt file (it can be named anything). In any row where the analyst chooses to override the default solution (which would be left blank or optionally with the value 1) put the solution number that you wish to use. Once you've annotated the file to your satisfaction you may trigger the final stage of ABSOLUTE with the ExtractReviewedResults function, which is run as follows:

Argument

Description

reviewed.pp.calls.fn

Name of the file to be uploaded

analyst.id

The user id of the analyst who called the solutions

modes.fn

The obj_summary.PP-modes.data.RData file from createReviewObject

out.dir.base

The root directory to write results to

obj.name

A descriptive name of this collection of samples, should be the same as with createReviewObject above

copy_num_type

The copy number type to assess, can be one of 'allelic' or 'total'. This should match the value used for RunAbsolute.

 

There is one optional argument verbose. If you would like to use this setting specify verbose=TRUE. Running this function will create a directory out.dir.base/reviewed which will contain the following files:

File

Description

obj.name.called.ABSOLUTE.table.txt

A final version of the purity/ploidy table reflecting the selected solution number

obj.name.called.ABSOLUTE.plots.pdf

A final version of the PDF from createReviewObject which reflects the selected solution number

samples/sample.name.ABSOLUTE.obj.name.called.RData

A final RData version of the ABSOLUTE output reflecting the selected solution number

SEG_MAF/plate.name.segtab.txt

A segmentation file giving the absolute copy-numbers of each input segment

SEG_MAF/plate.name_ABS_MAF.txt

(optional) if a MAF file was specified in the call to RunAbsolute a new version is supplied with further annotations giving multiplicity and clonality of the SSNVs

Example

Here is an example invocation of the ABSOLUTE on the mixing experiment data in Figure 2d, using input data from a previous HAPSEG run and the bundled data. This is intended to be run in the bundle directory. This can be run on a multicore system by uncommenting the registerDoMC call and specifying the number of cores that you wish to use. Besides ABSOLUTE this code also requires the use of foreach and optionally doMC. This code will create a directory output which will contain a a per-sample output directory as well as a subdirectory named abs_log which provides per-sample text file containing the standard out being emitted from R.

DoAbsolute <- function(scan, sif) {
  registerDoSEQ()
  library(ABSOLUTE)
  plate.name <- "DRAWS"
  genome <- "hg18"
  platform <- "SNP_250K_STY"
  primary.disease <- sif[scan, "PRIMARY_DISEASE"]
  sample.name <- sif[scan, "SAMPLE_NAME"]
  sigma.p <- 0
  max.sigma.h <- 0.02
  min.ploidy <- 0.95
  max.ploidy <- 10
  max.as.seg.count <- 1500
  max.non.clonal <- 0
  max.neg.genome <- 0
  copy_num_type <- "allelic"
  seg.dat.fn <- file.path("output", scan, "hapseg",
                          paste(plate.name, "_", scan, "_segdat.RData", sep=""))
  results.dir <- file.path(".", "output", scan, "absolute")
  print(paste("Starting scan", scan, "at", results.dir))
  log.dir <- file.path(".", "output", "abs_logs")
  if (!file.exists(log.dir)) {
     dir.create(log.dir, recursive=TRUE)
  }
  if (!file.exists(results.dir)) {
     dir.create(results.dir, recursive=TRUE)
  }
  sink(file=file.path(log.dir, paste(scan, ".abs.out.txt", sep="")))
  RunAbsolute(seg.dat.fn, sigma.p, max.sigma.h, min.ploidy, max.ploidy, primary.disease, 
              platform, sample.name, results.dir, max.as.seg.count, max.non.clonal, 
              max.neg.genome, copy_num_type, verbose=TRUE)
  sink()
}
arrays.txt <- "./paper_example/mix250K_arrays.txt"
sif.txt <- "./paper_example/mix_250K_SIF.txt"
## read in array names
scans <- readLines(arrays.txt)[-1]
sif <- read.delim(sif.txt, as.is=TRUE)

library(foreach)
## library(doMC)
## registerDoMC(20)

foreach (scan=scans, .combine=c) %dopar% {
  DoAbsolute(scan, sif)
}

obj.name <- "DRAWS_summary"
results.dir <- file.path(".", "output", "abs_summary")
absolute.files <- file.path(".", "output",
                            scans, "absolute",
                            paste(scans, ".ABSOLUTE.RData", sep=""))
library(ABSOLUTE)
CreateReviewObject(obj.name, absolute.files, results.dir, "allelic", verbose=TRUE)

## At this point you'd perform your manual review and mark up the file 
## output/abs_summary/DRAWS_summary.PP-calls_tab.txt by prepending a column with
## your desired solution calls. After that (or w/o doing that if you choose to accept
## the defaults, which is what running this code will do) run the following command:

calls.path = file.path("output", "abs_summary", "DRAWS_summary.PP-calls_tab.txt")
modes.path = file.path("output", "abs_summary", "DRAWS_summary.PP-modes.data.RData")
output.path = file.path("output", "abs_extract")
ExtractReviewedResults(calls.path, "test", modes.path, output.path, "absolute", "allelic")