Tagged with #analyzecovariates
1 documentation article | 0 announcements | 9 forum discussions

Created 2014-06-12 22:52:09 | Updated 2015-04-26 00:23:41 | Tags: bqsr dependencies rscript analyzecovariates
Comments (17)

When you run AnalyzeCovariates to analyze your BQSR outputs, you may encounter an error starting with this line:

org.broadinstitute.sting.utils.R.RScriptExecutorException: RScript exited with 1. Run with -l DEBUG for more info.

The main reason why this error often occurs is simple, and so is the solution. The script depends on some external R libraries, so if you don't have them installed, the script fails. To find out what libraries are necessary and how to install them, you can refer to this tutorial.

One other common issue is that the version of ggplot2 you have installed is very recent and is not compatible with the BQSR script. If so, download this Rscript file and use it to generate the plots manually according to the instructions below.

If you have already checked that you have all the necessary libraries installed, you'll need to run the script manually in order to find out what is wrong. To new users, this can seem complicated, but it only takes these 3 simple steps to do it!

1. Re-run AnalyzeCovariates with these additional parameters:

  • -l DEBUG (that's a lowercase L, not an uppercase i, to be clear) and
  • -csv my-report.csv (where you can call the .csv file anything; this is so the intermediate csv file will be saved).

2. Identify the lines in the log output that says what parameters the RScript is given.

The snippet below shows you the components of the R script command line that AnalyzeCovariates uses.

INFO  18:04:55,355 AnalyzeCovariates - Generating plots file 'RTest.pdf' 
DEBUG 18:04:55,672 RecalUtils - R command line: Rscript (resource)org/broadinstitute/gatk/utils/recalibration/BQSR.R /Users/schandra/BQSR_Testing/RTest.csv /Users/schandra/BQSR_Testing/RTest.recal /Users/schandra/BQSR_Testing/RTest.pdf 
DEBUG 18:04:55,687 RScriptExecutor - Executing: 
DEBUG 18:04:55,688 RScriptExecutor -   Rscript 
DEBUG 18:04:55,688 RScriptExecutor -   -e 
DEBUG 18:04:55,688 RScriptExecutor -   tempLibDir = '/var/folders/j9/5qgr3mvj0590pd2yb9hwc15454pxz0/T/Rlib.2085451458391709180';source('/var/folders/j9/5qgr3mvj0590pd2yb9hwc15454pxz0/T/BQSR.761775214345441497.R'); 
DEBUG 18:04:55,689 RScriptExecutor -   /Users/schandra/BQSR_Testing/RTest.csv 
DEBUG 18:04:55,689 RScriptExecutor -   /Users/schandra/BQSR_Testing/RTest.recal 
DEBUG 18:04:55,689 RScriptExecutor -   /Users/schandra/BQSR_Testing/RTest.pdf 

So, your full command line will be:

RScript BQSR.R RTest.csv RTest.recal RTest.pdf

Please note:

3. Run the script manually with the above arguments.

For new users, the easiest way to do this is to do it from within an IDE program like RStudio. Or, you can start up R at the command line and run it that way, whatever you are comfortable with.

No posts found with the requested search criteria.

Created 2015-08-27 08:11:30 | Updated | Tags: analyzecovariates r warnings
Comments (10)


When running the BQSR script, I get the following warnings

Warning messages: 1: NAs introduced by coercion 2: NAs introduced by coercion

I have managed to track down exactly where they come from:

for(cov in levels(data$CovariateName)) {
  d = data[data$CovariateName==cov,]
  if( cov == "Context" ) {
    d$CovariateValue = as.character(d$CovariateValue)
    d$CovariateValue = substring(d$CovariateValue,nchar(d$CovariateValue)-2,nchar(d$CovariateValue))
  } else {
    d$CovariateValue = as.numeric(levels(d$CovariateValue))[as.integer(d$CovariateValue)]

Here the problem is that levels(d$CovariateValue) contains both integers and strings (short DNA sequences), and the latter causes as.numeric to introduce NAs.

Is this something to be worried about? I am using GATK 3.4-46, but the error also occurs in 3.3-0.

Thanks, Michael Knudsen

Created 2015-08-06 09:28:46 | Updated | Tags: analyzecovariates
Comments (6)


I have used AnalyzeCovariates to plot the before and after recalibration results. I noticed that the quality scores in insertion and deletion panel are quite high, is this normal? For covariates plot ( cycle and context), why are there positive and negative values for quality score accuracy (y-axis)? Is each point represent a base? Any documentation with detail explanations on how to interpret these plots?


Created 2014-05-17 00:05:22 | Updated 2014-05-17 00:12:43 | Tags: analyzecovariates
Comments (4)


I am trying to generate a base recalibration plots using AnalyzeCovariate

My command is such

java -jar GenomeAnalysisTK.jar \
-T AnalyzeCovariates -R GRCh37-lite.fa \
-before test_data/realigned/SA495-Tumor.sorted.realigned.grp \
-after test_data/realigned/SA495-Tumor.sorted.post_recal.grp2 \
-plots recal_plots.pdf

and this gives me an error

INFO  17:01:06,050 HelpFormatter - Date/Time: 2014/05/16 17:01:06
INFO  17:01:06,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:01:06,050 HelpFormatter - --------------------------------------------------------------------------------
INFO  17:01:06,962 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:01:07,193 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  17:01:07,317 GenomeAnalysisEngine - Preparing for traversal
INFO  17:01:07,339 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:01:07,340 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
INFO  17:01:08,293 ContextCovariate -       Context sizes: base substitution model 2, indel substitution model 3
INFO  17:01:08,537 ContextCovariate -       Context sizes: base substitution model 2, indel substitution model 3
INFO  17:01:08,592 AnalyzeCovariates - Generating csv file '/tmp/AnalyzeCovariates3565832248324656361.csv'
INFO  17:01:09,077 AnalyzeCovariates - Generating plots file 'recal_plots.pdf'
INFO  17:01:18,598 GATKRunReport - Uploaded run statistics report to AWS S3
 ERROR ------------------------------------------------------------------------------------------
 ERROR stack trace
org.broadinstitute.sting.utils.R.RScriptExecutorException: RScript exited with 1. Run with -l DEBUG for more info.
    at org.broadinstitute.sting.utils.R.RScriptExecutor.exec(RScriptExecutor.java:174)
    at org.broadinstitute.sting.utils.recalibration.RecalUtils.generatePlots(RecalUtils.java:548)
    at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.generatePlots(AnalyzeCovariates.java:380)
    at org.broadinstitute.sting.gatk.walkers.bqsr.AnalyzeCovariates.initialize(AnalyzeCovariates.java:394)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:107)
 ERROR ------------------------------------------------------------------------------------------
 ERROR A GATK RUNTIME ERROR has occurred (version 3.1-1-g07a4bf8):
 ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
 ERROR If not, please post the error message, with stack trace, to the GATK forum.
 ERROR Visit our website and forum for extensive documentation and answers to
 ERROR commonly asked questions http://www.broadinstitute.org/gatk
 ERROR MESSAGE: RScript exited with 1. Run with -l DEBUG for more info.
 ERROR ------------------------------------------------------------------------------------------

Ideas ? Thanks

Created 2014-02-06 16:17:55 | Updated | Tags: pdf analyzecovariates
Comments (11)

I'm trying to run AnalyzeCovariates to produce calibration plots, but not getting a PDF, so I decided to upgrade my R installation and all the packages required (gsalib, ggplot2, etc). Now I'm getting the following error:

ERROR MESSAGE: Bad input: The GATK report has an unknown/unsupported version in the header: %PDF-1.4

I'm using GATK version 2.8-1-g932cd3a.

Here's the command I'm running:

java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates \
    -R /path/genome.fa \
    -L /path/genome.interval_list \
    -before recal1.table \
    -after recal2.table \
    -plots recal.pdf \
    -csv recal.csv

I'm using the latest version of R and all the packages. Here's my R sessionInfo():

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

[1] C

attached base packages:
[1] grid      tools     stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] gsalib_2.0           reshape_0.8.4        plyr_1.8            
[4] gplots_2.12.1        ggplot2_0.9.3.1      BiocInstaller_1.12.0

loaded via a namespace (and not attached):
 [1] KernSmooth_2.23-10 MASS_7.3-29        RColorBrewer_1.0-5 bitops_1.0-6      
 [5] caTools_1.16       colorspace_1.2-4   dichromat_2.0-0    digest_0.6.4      
 [9] gdata_2.13.2       gtable_0.1.2       gtools_3.2.1       labeling_0.2      
[13] munsell_0.4.2      proto_0.3-10       reshape2_1.2.2     scales_0.2.3      
[17] stringr_0.6.2     

I've seen in many other posts suggestions to manually run the BQSR.R script on the data, but I don't have a CSV file yet, and there were no instructions on how to manually run BQSR.R, i.e., what arguments to specify to the Rscript command, and in what order.

Any help solving this problem would be greatly appreciated.

Created 2014-01-05 12:06:13 | Updated 2014-01-05 12:07:05 | Tags: analyzecovariates r
Comments (2)

I am running GATK in clusters via pbs scheduling, and found "AnalyzeCovariates" could not use customized Rscript path.

More info:

All nodes have CentOS installed, R is already installed and could be found under "/usr/bin/R" from "which R". Unfortunately, R version is not identical among nodes, i.e., some nodes have R 2.15, and some have R 3.0 installed.

I installed the latest R version under my home folder, and add following commands to .bash_profile and .bash_rc:

if [ lsb_release -i|cut -c17-20 == 'Cent' ] ; then alias R='/home/XXX/R-3.0.2/bin/R' alias Rscript='/home/XXX/R-3.0.2/bin/Rscript' fi

If I login to the cluster via qsub -I, and type R in the console, customized R will be invoked, and this is also shown in "which R" :

alias R='/home/XXX/R-3.0.2/bin/R' ~/R-3.0.2/bin/R

All GATK required packages have been installed.

However, when I run AnalyzeCovariates, it reported that some packages are missing, and it turns out that AnalyzeCovariates is using the R under "/usr/bin/R". So how to make AnalyzeCovariates use the right R? Do I miss something in the bash configure files?


Created 2013-12-03 07:06:47 | Updated | Tags: analyzecovariates
Comments (1)

I am using GATK 2.7.2. I am working on the Best practices of GATK. I have followed all the steps as mentioned for Best practices. I want to Generate before/after plots. This is done by the following command

-T AnalyzeCovariates -R ReferenceFiles\sequence.fasta -l DEBUG -before ReferenceFiles\recal_data.table -after ReferenceFiles\post_recal_data.table -plots ReferenceFiles\recalibration_plots.pdf

On running this command I get the error. Please refer attachment for error : “GATK_AnalyzeCovariant_Error.txt”

After referring the forums on the http://www.broadinstitute.org : -I have already installed R script and set R_HOME in my environment variables and also in the path. -I have copied the BQSR.R in the GATK tools folder. -I have installed the gsalib package in R -I have installed the ggplot2 package in R -Since I thought It can be network proxy issue, I have also registered on http://www.broadinstitute.org forum and asked for the .key file which is used to disable "phone-home" feature that sends us information about each GATK run via the Broad file system (within the Broad) and Amazon's S3 cloud storage service (outside the Broad). It will be reviewed by them and then I can get my key.

Please help me to know what exactly can be the issue.


Created 2013-08-07 14:16:09 | Updated | Tags: baserecalibrator analyzecovariates
Comments (1)

I preformed Phase 1 with GATK 2.5-2. Has 2.6-5 changed enough to warrant redoing with the GATK 2.6-5? In particular, I would like to use the new plotting features of AnalyzeCovariates. Do I need to redo this in order to use the latest?

If I can use GATK 2.5-2 for Phase 1, can I move on with GATK 2.6-5?

Thank you.

Created 2013-07-11 23:08:11 | Updated 2013-07-11 23:08:34 | Tags: baserecalibrator documentation analyzecovariates
Comments (1)

In GATK 2.6, there have been some changes to BaseRecalibrator. Based on the AnalyzeCovariates page, it must now be run twice. To generate the first pass recalibration table file, it's the same command as before. To generate the second pass recalibration table file, you need to add the -BQSR argument. However, on the BaseRecalibrator page, there is no -BQSR documentation.

Created 2013-07-05 16:57:08 | Updated | Tags: tutorials baserecalibrator analyzecovariates
Comments (1)

in Step 3, the example of code still has the deprecated walker
-T AnalyzeCovariants
which when used generates this,
"ERROR MESSAGE: Walker AnalyzeCovariates is no longer available in the GATK; it has been deprecated since version 2.0 (use BaseRecalibrator instead; see documentation for usage)"