How do I run ContEst?
The tool is run using Java; the command to execute the tool looks like:
java -jar ContEst.jar -T Contamination
The required command-line arguments for the tool:
Common, optional parameters include:
ContEst is a Java tool, based on the Genome Analysis toolkit (GATK), and many of it's inputs are processed through the GATK's engine;To get more information on how to run the tool, you can run the following the command:
java -jar ContEst.jar -T Contamination -h
which produces the following output (along with help using the GATK in general):
Arguments for Contamination:
-o,--out <out> An output file presented to the walker. Will overwrite
contents if file exists.
what fraction of sites with highest and lowest likelihood
values to trim
set to META (default), SAMPLE or LANE to produce per-bam,
per-sample or per-lane estimates
The sample name; used to extract the correct genotypes
from mutli-sample truth vcfs
the degree of precision to which the contamination tool
should estimate (e.g. the bin size)
-vs,--verify_sample should we veriy that the sample name is in the genotypes
what minimum number of bases do we need to see to call
contamination in a lane / sample?
evaulate contamination for just a single contamination
Example ContEst Command
An example data package is available from the download page. You'll also need to download the 1000 genomes B37 reference file and the associated fai file:
To run the example, you'll need to have downloaded the ContEst binary zip file and the hg19_population_stratified_af_hapmap_3.3.vcf.gz to your system. The example data is based on two low contamination level 1000 genomes samples, mixed together. The command to run is:
java -Xmx2g -jar <ContEst_JAR_Location>/ContEst.jar \
This example will produce an output file which should look like the following:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
Here we can see that ContEst found that the file was approximately 8.2 percent contaminated, with a 95% confidence interval from 7.7 to 8.6.