Tagged with #rscript
2 documentation articles | 0 announcements | 3 forum discussions


Comments (8)

When you are running AnalyzeCovariates to analyze your BQSR outputs, you may run into an error starting with this:

org.broadinstitute.sting.utils.R.RScriptExecutorException: RScript exited with 1. Run with -l DEBUG for more info.

The main reason why this error often occurs is simple, and so is the solution. The script depends on some external R libraries, so if you don’t have them installed, the script fails. To find out what libraries are necessary and how to install them, you can refer to this FAQ article.

One other common issue is that the version of ggplot2 you have installed is very recent and is not compatible with the BQSR script. If so, download this file and use it to generate the plots manually according to the instructions below.

If you have already checked that you have all the necessary libraries installed, you’ll need to run the script manually in order to find out what is wrong. To new users, this can seem complicated, but it only takes these 3 simple steps to do it!

1. Re-run AnalyzeCovariates with these additional parameters:

  • -l DEBUG (that's a lowercase L, not an uppercase i, to be clear) and
  • -csv my-report.csv (where you can call the .csv file anything; this is so the intermediate csv file will be saved).

2. Identify the lines in the log output that says what parameters the RScript is given.

The snippet below shows you the components of the R script command line that AnalyzeCovariates uses.

INFO  18:04:55,355 AnalyzeCovariates - Generating plots file 'RTest.pdf' 
DEBUG 18:04:55,672 RecalUtils - R command line: Rscript (resource)org/broadinstitute/gatk/utils/recalibration/BQSR.R /Users/schandra/BQSR_Testing/RTest.csv /Users/schandra/BQSR_Testing/RTest.recal /Users/schandra/BQSR_Testing/RTest.pdf 
DEBUG 18:04:55,687 RScriptExecutor - Executing: 
DEBUG 18:04:55,688 RScriptExecutor -   Rscript 
DEBUG 18:04:55,688 RScriptExecutor -   -e 
DEBUG 18:04:55,688 RScriptExecutor -   tempLibDir = '/var/folders/j9/5qgr3mvj0590pd2yb9hwc15454pxz0/T/Rlib.2085451458391709180';source('/var/folders/j9/5qgr3mvj0590pd2yb9hwc15454pxz0/T/BQSR.761775214345441497.R'); 
DEBUG 18:04:55,689 RScriptExecutor -   /Users/schandra/BQSR_Testing/RTest.csv 
DEBUG 18:04:55,689 RScriptExecutor -   /Users/schandra/BQSR_Testing/RTest.recal 
DEBUG 18:04:55,689 RScriptExecutor -   /Users/schandra/BQSR_Testing/RTest.pdf 

So, your full command line will be:

RScript BQSR.R RTest.csv RTest.recal RTest.pdf

Please note:

  • BQSR.R is the name of the script you want to run. It can be found here
  • RTest.csv is the name of the original csv file output from AnalyzeCovariates.
  • RTest.recal is your original recalibration file.
  • RTest.pdf is the output pdf file; you can name it whatever you want.

3. Run the script manually with the above arguments.

For new users, the easiest way to do this is to do it from within an IDE program like RStudio. Or, you can start up R at the command line and run it that way, whatever you are comfortable with.

Comments (43)

Objective

Install all software packages required to follow the GATK Best Practices.

Prerequisites

To follow these instructions, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your systems administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.

  • Basic Unix environment commands
  • Binary / Executable
  • Compiling a binary
  • Adding a binary to your path
  • Command-line shell, terminal or console
  • Software library

You will also need to have access to an ANSI compliant C++ compiler and the tools needed for normal compilations (make, shell, the standard library, tar, gunzip). These tools are usually pre-installed on Linux/Unix systems. On MacOS X, you may need to install the MacOS Xcode tools. See https://developer.apple.com/xcode/ for relevant information and software downloads.

Starting with version 2.6, the GATK requires Java Runtime Environment version 1.7. All Linux/Unix and MacOS X systems should have a JRE pre-installed, but the version may vary. To test your Java version, run the following command in the shell:

java -version 

This should return a message along the lines of ”java version 1.7.0_25” as well as some details on the Runtime Environment (JRE) and Virtual Machine (VM). If you have a version other than 1.7.x, be aware that you may run into trouble with some of the more advanced features of the Picard and GATK tools. The simplest solution is to install an additional JRE and specify which you want to use at the command-line. To find out how to do so, you should seek help from your systems administrator.

Software packages

  1. BWA
  2. SAMtools
  3. HTSlib (optional)
  4. Picard
  5. Genome Analysis Toolkit (GATK)
  6. IGV
  7. RStudio IDE and R libraries ggplot2 and gsalib

1. BWA

Read the overview of the BWA software on the BWA project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xvzf bwa-0.7.5a.tar.bz2 

This will produce a directory called bwa-0.7.5a containing the files necessary to compile the BWA binary. Move to this directory and compile using:

cd bwa-0.7.5a
make

The compiled binary is called bwa. You should find it within the same folder (bwa-0.7.5a in this example). You may also find other compiled binaries; at time of writing, a second binary called bwamem-lite is also included. You can disregard this file for now. Finally, just add the BWA binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

bwa 

This should print out some version and author information as well as a list of commands. As the Usage line states, to use BWA you will always build your command lines like this:

bwa <command> [options] 

This means you first make the call to the binary (bwa), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command.


2. SAMtools

Read the overview of the SAMtools software on the SAMtools project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xvzf samtools-0.1.19.tar.bz2 

This will produce a directory called samtools-0.1.19 containing the files necessary to compile the SAMtools binary. Move to this directory and compile using:

cd samtools-0.1.19 
make 

The compiled binary is called samtools. You should find it within the same folder (samtools-0.1.19 in this example). Finally, add the SAMtools binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

samtools 

This should print out some version information as well as a list of commands. As the Usage line states, to use SAMtools you will always build your command lines like this:

samtools <command> [options] 

This means you first make the call to the binary (samtools), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is the same convention as used by BWA.


3. HTSlib (optional)

Read the overview of the HTSlib software on the HTSlib project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xjf htslib-master.zip 

This will produce a directory called htslib-master containing the files necessary to compile the HTSlib binary. Move to this directory and compile using:

cd htslib-master 
make 

The compiled binary is called htscmd. You should find it within the same folder (htslib-master in this example). Finally, add the HTSlib binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

htscmd 

This should print out some version information as well as a list of commands. As the Usage line states, to use HTSlib you will always build your command lines like this:

htscmd <command> [options] 

This means you first make the call to the binary (htscmd), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is the same convention as used by BWA and SAMtools.


4. Picard

Read the overview of the Picard software on the Picard project homepage, then download the latest version of the software package.

  • Installation

Unpack the zip file using:

tar xjf picard-tools-1.94.zip 

This will produce a directory called picard-tools-1.94 containing the Picard jar files. Picard tools are distributed as pre-compiled Java executables (jar files) so there is no need to compile them. Finally, add the Picard directory to your path to make the tools available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

java -jar AddOrReplaceReadGroups.jar -h 

This should print out some version and usage information about the AddOrReplaceReadGroups.jar tool. At this point you will have noticed an important difference between BWA and Picard tools. To use BWA, we called on the BWA program and specified which of its internal tools we wanted to apply. To use Picard, we called on Java itself as the main program, then specified which jar file to use, knowing that one jar file = one tool. This applies to all Picard tools; to use them you will always build your command lines like this:

java -jar <ToolName.jar> [options] 

Next we will see that GATK tools are called in yet another way. The reasons for how tools in a given software package are organized and invoked are largely due to the preferences of the software developers. They generally do not reflect strict technical requirements, although they can have an effect on speed and efficiency.


5. Genome Analysis Toolkit (GATK)

Hopefully if you're reading this, you're already acquainted with the purpose of the GATK, so go ahead and download the latest version of the software package.

In order to access the downloads, you need to register for a free account on the GATK support forum. You will also need to read and accept the license agreement before downloading the GATK software package. Note that if you intend to use the GATK for commercial purposes, you will need to purchase a license from our commercial partner, Appistry. See Appistry's GATK FAQ page for an overview of the commercial licensing conditions.

  • Installation

Unpack the tar file using:

tar xjf GenomeAnalysisTK-2.6-4.tar.bz2 

This will produce a directory called GenomeAnalysisTK-2.6-4-g3e5ff60 containing the GATK jar file, which is called GenomeAnalysisTK.jar, as well as a directory of example files called resources. GATK tools are distributed as a single pre-compiled Java executable so there is no need to compile them. Finally, add the GATK directory to your path to make the tools available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

java -jar GenomeAnalysisTK.jar -h 

This should print out some version and usage information, as well as a list of the tools included in the GATK. As the Usage line states, to use GATK you will always build your command lines like this:

java -jar GenomeAnalysisTK.jar -T <ToolName> [arguments] 

This means you first make the call to Java itself as the main program, then specify the GenomeAnalysisTK.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis.

So this way of calling the program and selecting which tool to run is a little like a hybrid of how we called BWA and how we called Picard tools. To put it another way, if BWA is a standalone game device that comes preloaded with several games, Picard tools are individual game cartridges that plug into the Java console, and GATK is a single cartridge that also plugs into the Java console but contains many games.


6. IGV

The Integrated Genomics Viewer is a genome browser that allows you to view BAM, VCF and other genomic file information in context. It has a graphical user interface that is very easy to use, and can be downloaded for free (though registration is required) from this website.


7. RStudio IDE and R libraries ggplot2 and gsalib

Download the latest version of RStudio IDE. The webpage should automatically detect what platform you are running on and recommend the version most suitable for your system.

  • Installation

Follow the installation instructions provided. Binaries are provided for all major platforms; typically they just need to be placed in your Applications (or Programs) directory. Open RStudio and type the following command in the console window:

install.packages("ggplot2") 

This will download and install the ggplot2 library as well as any other library packages that ggplot2 depends on for its operation. Note that some users have reported having to install one additional package themselves, called reshape, which you can do as follows:

install.packages("reshape")

Finally, do the same thing to install the gsalib library:

install.packages("gsalib")

This will download and install the gsalib library.

Important note

If you are using a recent version of ggplot2 and a version of GATK older than 3.2, you may encounter an error when trying to generate the BQSR or VQSR recalibration plots. This is because until recently our scripts were still using an older version of certain ggplot2 functions. This has been fixed in GATK 3.2, so you should either upgrade your version of GATK (recommended) or downgrade your version of ggplot2. If you experience further issues generating the BQSR recalibration plots, please see this tutorial.

No posts found with the requested search criteria.
Comments (2)

Hi,

I found if I add library(grid) to gatk-protected / public / R / scripts / org / broadinstitute / sting / queue / util / queueJobReport.R the error goes away.

INFO 15:49:53,263 QCommandLine - Writing final jobs report... INFO 15:49:53,263 QJobsReporter - Writing JobLogging GATKReport to file /Users/cborroto/workspace/sciencemodule/data_processing/GenePeeksPipeline.jobreport.txt INFO 15:49:53,270 QJobsReporter - Plotting JobLogging GATKReport to file /Users/cborroto/workspace/sciencemodule/data_processing/GenePeeksPipeline.jobreport.pdf DEBUG 15:49:53,278 RScriptExecutor - Executing: DEBUG 15:49:53,278 RScriptExecutor - Rscript DEBUG 15:49:53,278 RScriptExecutor - -e DEBUG 15:49:53,278 RScriptExecutor - tempLibDir = '/Users/cborroto/workspace/sciencemodule/data_processing/tmp/Rlib.5689446532231761075';install.packages(pkgs=c('/Users/cborroto/workspace/sciencemodule/data_processing/tmp/RlibSources.4324191911112824298/gsalib'), lib=tempLibDir, repos=NULL, type='source', INSTALL_opts=c('--no-libs', '--no-data', '--no-help', '--no-demo', '--no-exec'));library('gsalib', lib.loc=tempLibDir);source('/Users/cborroto/workspace/sciencemodule/data_processing/tmp/queueJobReport.6856864909021911181.R'); DEBUG 15:49:53,278 RScriptExecutor - /Users/cborroto/workspace/sciencemodule/data_processing/GenePeeksPipeline.jobreport.txt DEBUG 15:49:53,278 RScriptExecutor - /Users/cborroto/workspace/sciencemodule/data_processing/GenePeeksPipeline.jobreport.pdf * installing *source* package ‘gsalib’ ... ** R ** preparing package for lazy loading ** building package indices ** testing if installed package can be loaded * DONE (gsalib) Loading required package: methods KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Attaching package: ‘gplots’ The following object is masked from ‘package:stats’: lowess Loading required package: plyr Attaching package: ‘reshape’ The following objects are masked from ‘package:plyr’: rename, round_any [1] "Report" [1] "Project : /Users/cborroto/workspace/sciencemodule/data_processing/GenePeeksPipeline.jobreport.txt" Error in do.call("layer", list(mapping = mapping, data = data, stat = stat, : could not find function "arrow" Calls: source ... geom_segment -> <Anonymous> -> <Anonymous> -> do.call Execution halted DEBUG 15:49:55,207 RScriptExecutor - Result: 1 WARN 15:49:55,207 RScriptExecutor - RScript exited with 1

Thanks, Carlos

Comments (3)

I have ran VariantRecalibrator on a smaller VCF file (made with UnifiedGenotyper -L chr1) and it finished with no errors. Then I ran VCF file (made with UnifiedGenotyper without -L parameter) and it crashed but also without any error. The log file of the first successful run looks like this

... INFO 13:26:21,377 VariantRecalibrator - Building FS x DP plot... INFO 13:26:21,379 VariantRecalibratorEngine - Evaluating full set of 18354 variants... INFO 13:26:23,556 VariantRecalibratorEngine - Evaluating full set of 18354 variants... INFO 13:26:25,378 VariantRecalibrator - Building QD x DP plot... INFO 13:26:25,379 VariantRecalibratorEngine - Evaluating full set of 6384 variants... INFO 13:26:26,140 VariantRecalibratorEngine - Evaluating full set of 6384 variants... INFO 13:26:26,832 VariantRecalibrator - Executing: Rscript /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_L_VariantRecalibrator/13_all_snp.plots.R INFO 13:26:28,400 ProgressMeter - chrY:59358202 5.66e+07 14.0 m 14.0 s 98.7% 14.2 m 11.0 s INFO 13:26:58,410 ProgressMeter - chrY:59358202 5.66e+07 14.5 m 15.0 s 98.7% 14.7 m 11.0 s INFO 13:27:28,420 ProgressMeter - chrY:59358202 5.66e+07 15.0 m 15.0 s 98.7% 15.2 m 12.0 s INFO 13:27:58,429 ProgressMeter - chrY:59358202 5.66e+07 15.5 m 16.0 s 98.7% 15.7 m 12.0 s INFO 13:28:28,439 ProgressMeter - chrY:59358202 5.66e+07 16.0 m 16.0 s 98.7% 16.2 m 12.0 s INFO 13:28:58,449 ProgressMeter - chrY:59358202 5.66e+07 16.5 m 17.0 s 98.7% 16.7 m 13.0 s INFO 13:29:28,460 ProgressMeter - chrY:59358202 5.66e+07 17.0 m 18.0 s 98.7% 17.2 m 13.0 s INFO 13:29:58,469 ProgressMeter - chrY:59358202 5.66e+07 17.5 m 18.0 s 98.7% 17.7 m 14.0 s INFO 13:30:28,479 ProgressMeter - chrY:59358202 5.66e+07 18.0 m 19.0 s 98.7% 18.2 m 14.0 s INFO 13:30:58,489 ProgressMeter - chrY:59358202 5.66e+07 18.5 m 19.0 s 98.7% 18.7 m 14.0 s INFO 13:31:28,155 VariantRecalibrator - Executing: Rscript (resource)org/broadinstitute/sting/gatk/walkers/variantrecalibration/plot_Tranches.R /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_L_VariantRecalibrator/13_all_snp.tranches 2.15 INFO 13:31:28,499 ProgressMeter - chrY:59358202 5.66e+07 19.0 m 20.0 s 98.7% 19.3 m 15.0 s INFO 13:31:28,847 ProgressMeter - done 5.66e+07 19.0 m 20.0 s 98.7% 19.3 m 15.0 s INFO 13:31:28,847 ProgressMeter - Total runtime 1140.77 secs, 19.01 min, 0.32 hours I ...

The log file of the crashed run ENDS like this ... INFO 19:09:28,987 VariantRecalibrator - Building FS x QD plot... INFO 19:09:28,988 VariantRecalibratorEngine - Evaluating full set of 7300 variants... INFO 19:09:30,191 VariantRecalibratorEngine - Evaluating full set of 7300 variants... INFO 19:09:31,214 VariantRecalibrator - Building FS x DP plot... INFO 19:09:31,217 VariantRecalibratorEngine - Evaluating full set of 21170 variants... INFO 19:09:34,703 VariantRecalibratorEngine - Evaluating full set of 21170 variants... INFO 19:09:37,357 VariantRecalibrator - Building QD x DP plot... INFO 19:09:37,358 VariantRecalibratorEngine - Evaluating full set of 7250 variants... INFO 19:09:38,552 VariantRecalibratorEngine - Evaluating full set of 7250 variants... INFO 19:09:39,550 VariantRecalibrator - Executing: Rscript /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.plots.R INFO 19:09:51,101 ProgressMeter - chrY:59358159 5.68e+07 16.5 m 17.0 s 98.7% 16.7 m 13.0 s INFO 19:10:21,111 ProgressMeter - chrY:59358159 5.68e+07 17.0 m 17.0 s 98.7% 17.2 m 13.0 s INFO 19:10:51,119 ProgressMeter - chrY:59358159 5.68e+07 17.5 m 18.0 s 98.7% 17.7 m 14.0 s INFO 19:11:21,129 ProgressMeter - chrY:59358159 5.68e+07 18.0 m 19.0 s 98.7% 18.2 m 14.0 s INFO 19:11:51,139 ProgressMeter - chrY:59358159 5.68e+07 18.5 m 19.0 s 98.7% 18.7 m 14.0 s INFO 19:12:21,148 ProgressMeter - chrY:59358159 5.68e+07 19.0 m 20.0 s 98.7% 19.3 m 15.0 s INFO 19:12:51,157 ProgressMeter - chrY:59358159 5.68e+07 19.5 m 20.0 s 98.7% 19.8 m 15.0 s INFO 19:13:21,167 ProgressMeter - chrY:59358159 5.68e+07 20.0 m 21.0 s 98.7% 20.3 m 16.0 s INFO 19:13:51,176 ProgressMeter - chrY:59358159 5.68e+07 20.5 m 21.0 s 98.7% 20.8 m 16.0 s

I am suspecting that the execution of the tranches Rscript crashed it? Like I said, no error showed up when it crashed. Any suggestions how to make it work?

My command line: INFO 18:53:18,124 HelpFormatter - Program Args: -T VariantRecalibrator -R /cluster8/podlaha/HumanGenome/ucsc.hg19.fasta -mode SNP -input /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/12_UnifiedGenotyper/12_UG_all.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /cluster11/podlaha/Software/GATK/Resources/hapmap_3.3.hg19.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 /cluster11/podlaha/Software/GATK/Resources/1000G_omni2.5.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 /cluster11/podlaha/Software/GATK/Resources/dbsnp_137.hg19.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 /cluster11/podlaha/Software/GATK/Resources/1000G_phase1.snps.high_confidence.hg19.vcf -an MQ -an MQ0 -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an DP -an BaseQRankSum -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -numBad 1000 --maxGaussians 4 -recalFile /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.recal -tranchesFile /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.tranches -rscriptFile /cluster11/podlaha/AllelicImbalance/Data/GATK_VCF_Output/13_VariantRecalibrator/13_all_snp.plots.R

Running GATK 2.7.4. and R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" and java version "1.7.0_40"

Comments (6)

Hi there, I was trying to debug an error in the RScript generated after base recalibration, while running the DataProcessingPipeline.scala (run as it is). I get the following debug output

 [...]
 Error in file(filename, "r", blocking = TRUE) : 
   cannot open the connection
 Calls: source ... eval.with.vis -> eval.with.vis -> gsa.read.gatkreport -> file
 In addition: Warning messages:
 1: In file(filename, "r", blocking = TRUE) :
   cannot open file '/SAN/scratch3/sample378_TTAGGC_L004_R1_001.fastq.pre_recal.table.recal': No such file or directory
  Execution halted

no file ending with "recal.table.recal" exists, but the file "recal.table" does exist. I couldn't find any step in the scala script where a ".recal" is added to "recal.table", nor a specific trait or class referring to the RScript itself, as I understand it's part of the walker BaseRecalibrator.

is this a small bug in the name handling, or am I doing something wrong somewhere?

thanks, Francesco