# Tagged with #r 1 documentation article | 0 announcements | 8 forum discussions

Created 2013-07-02 00:16:14 | Updated 2015-09-24 12:12:04 | Tags: install rscript igv picard gsalib samtools r ggplot2 rstudio

#### Objective

Install all software packages required to follow the GATK Best Practices.

#### Prerequisites

To follow these instructions, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your systems administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.

• Basic Unix environment commands
• Binary / Executable
• Compiling a binary
• Command-line shell, terminal or console
• Software library

You will also need to have access to an ANSI compliant C++ compiler and the tools needed for normal compilations (make, shell, the standard library, tar, gunzip). These tools are usually pre-installed on Linux/Unix systems. On MacOS X, you may need to install the MacOS Xcode tools. See https://developer.apple.com/xcode/ for relevant information and software downloads. The XCode tools are free but an AppleID may be required to download them.

Starting with version 2.6, the GATK requires Java Runtime Environment version 1.7. All Linux/Unix and MacOS X systems should have a JRE pre-installed, but the version may vary. To test your Java version, run the following command in the shell:

java -version 

This should return a message along the lines of ”java version 1.7.0_25” as well as some details on the Runtime Environment (JRE) and Virtual Machine (VM). If you have a version other than 1.7.x, be aware that you may run into trouble with some of the more advanced features of the Picard and GATK tools. The simplest solution is to install an additional JRE and specify which you want to use at the command-line. To find out how to do so, you should seek help from your systems administrator.

#### Software packages

1. BWA
2. SAMtools
3. Picard
4. Genome Analysis Toolkit (GATK)
5. IGV
6. RStudio IDE and R libraries ggplot2 and gsalib

Note that the version numbers of packages you download may be different than shown in the instructions below. If so, please adapt the number accordingly in the commands.

### 1. BWA

• Installation

Unpack the tar file using:

tar xvzf bwa-0.7.12.tar.bz2 

This will produce a directory called bwa-0.7.12 containing the files necessary to compile the BWA binary. Move to this directory and compile using:

cd bwa-0.7.12
make

The compiled binary is called bwa. You should find it within the same folder (bwa-0.7.12 in this example). You may also find other compiled binaries; at time of writing, a second binary called bwamem-lite is also included. You can disregard this file for now. Finally, just add the BWA binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

bwa 

This should print out some version and author information as well as a list of commands. As the Usage line states, to use BWA you will always build your command lines like this:

bwa <command> [options] 

This means you first make the call to the binary (bwa), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command.

### 2. SAMtools

• Installation

Unpack the tar file using:

tar xvjf samtools-0.1.2.tar.bz2 

This will produce a directory called samtools-0.1.2 containing the files necessary to compile the SAMtools binary. Move to this directory and compile using:

cd samtools-0.1.2
make 

The compiled binary is called samtools. You should find it within the same folder (samtools-0.1.2 in this example). Finally, add the SAMtools binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

samtools 

This should print out some version information as well as a list of commands. As the Usage line states, to use SAMtools you will always build your command lines like this:

samtools <command> [options] 

This means you first make the call to the binary (samtools), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is a similar convention as used by BWA.

### 3. Picard

• Installation

Unpack the zip file using:

tar xjf picard-tools-1.139.zip 

This will produce a directory called picard-tools-1.139 containing the Picard jar files. Picard tools are distributed as a pre-compiled Java executable (jar file) so there is no need to compile them.

Note that it is not possible to add jar files to your path to make the tools available on the command line; you have to specify the full path to the jar file in your java command, which would look like this:

java -jar ~/my_tools/jars/picard.jar <Toolname> [options]

This syntax will be explained in a little more detail further below.

However, you can set up a shortcut called an "environment variable" in your shell profile configuration to make this easier. The idea is that you create a variable that tells your system where to find a given jar, like this:

PICARD = "~/my_tools/jars/picard.jar"

So then when you want to run a Picard tool, you just need to call the jar by its shortcut, like this:

java -jar $PICARD <Toolname> [options] The exact way to set this up depends on what shell you're using and how your environment is configured. We like this overview and tutorial which explains how it all works; but if you are new to the command line environment and you find this too much too deal with, we recommend asking for help from your institution's IT support group. This completes the installation process. • Testing Open a shell and run: java -jar picard.jar -h  This should print out some version and usage information about the AddOrReplaceReadGroups.jar tool. At this point you will have noticed an important difference between BWA and Picard tools. To use BWA, we called on the BWA program and specified which of its internal tools we wanted to apply. To use Picard, we called on Java itself as the main program, then specified which jar file to use, knowing that one jar file = one tool. This applies to all Picard tools; to use them you will always build your command lines like this: java -jar picard.jar <ToolName> [options]  This means you first make the call to Java itself as the main program, then specify the picard.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis. Note that the command-line syntax of Picard tools has recently changed from java -jar <ToolName>.jar to java -jar picard.jar <ToolName>. We are using the newer syntax in this document, but some of our other documents may not have been updated yet. If you encounter any documents using the old syntax, let us know and we'll update them accordingly. If you are already using an older version of Picard, either adapt the commands or better, upgrade your version! Next we will see that GATK tools are called in essentially the same way, although the way the options are specified is a little different. The reasons for how tools in a given software package are organized and invoked are largely due to the preferences of the software developers. They generally do not reflect strict technical requirements, although they can have an effect on speed and efficiency. ### 4. Genome Analysis Toolkit (GATK) Hopefully if you're reading this, you're already acquainted with the purpose of the GATK, so go ahead and download the latest version of the software package. In order to access the downloads, you need to register for a free account on the GATK support forum. You will also need to read and accept the license agreement before downloading the GATK software package. Note that if you intend to use the GATK for commercial purposes, you will need to purchase a license. See the licensing page for an overview of the commercial licensing conditions. • Installation Unpack the tar file using: tar xjf GenomeAnalysisTK-3.3-0.tar.bz2  This will produce a directory called GenomeAnalysisTK-3.3-0 containing the GATK jar file, which is called GenomeAnalysisTK.jar, as well as a directory of example files called resources. GATK tools are distributed as a single pre-compiled Java executable so there is no need to compile them. Just like we discussed for Picard, it's not possible to add the GATK to your path, but you can set up a shortcut to the jar file using environment variables as described above. This completes the installation process. • Testing Open a shell and run: java -jar GenomeAnalysisTK.jar -h  This should print out some version and usage information, as well as a list of the tools included in the GATK. As the Usage line states, to use GATK you will always build your command lines like this: java -jar GenomeAnalysisTK.jar -T <ToolName> [arguments]  This means that just like for Picard, you first make the call to Java itself as the main program, then specify the GenomeAnalysisTK.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis. ### 5. IGV The Integrated Genomics Viewer is a genome browser that allows you to view BAM, VCF and other genomic file information in context. It has a graphical user interface that is very easy to use, and can be downloaded for free (though registration is required) from this website. We encourage you to read through IGV's very helpful user guide, which includes many detailed tutorials that will help you use the program most effectively. ### 6. RStudio IDE and R libraries ggplot2 and gsalib Download the latest version of RStudio IDE. The webpage should automatically detect what platform you are running on and recommend the version most suitable for your system. • Installation Follow the installation instructions provided. Binaries are provided for all major platforms; typically they just need to be placed in your Applications (or Programs) directory. Open RStudio and type the following command in the console window: install.packages("ggplot2")  This will download and install the ggplot2 library as well as any other library packages that ggplot2 depends on for its operation. Note that some users have reported having to install two additional package themselves, called reshape and gplots, which you can do as follows: install.packages("reshape") install.packages("gplots") Finally, do the same thing to install the gsalib library: install.packages("gsalib") This will download and install the gsalib library. Important note If you are using a recent version of ggplot2 and a version of GATK older than 3.2, you may encounter an error when trying to generate the BQSR or VQSR recalibration plots. This is because until recently our scripts were still using an older version of certain ggplot2 functions. This has been fixed in GATK 3.2, so you should either upgrade your version of GATK (recommended) or downgrade your version of ggplot2. If you experience further issues generating the BQSR recalibration plots, please see this tutorial. No posts found with the requested search criteria. Created 2016-01-13 17:18:14 | Updated | Tags: picard r collectmultiplemetrics Hello! I run CollectMultipleMetrics in a RNAseq pipeline, and I encounter this error: [Wed Jan 13 11:50:10 EST 2016] picard.analysis.CollectMultipleMetrics INPUT=sorted.mdup.bam REFERENCE_SEQUENCE=Mus_musculus.GRCm38.fa OUTPUT=metrics PROGRAM=[CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics] TMP_DIR=[/scratch] VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Jan 13 11:50:10 EST 2016] Executing as me@machine on Linux 2.6.32-573.12.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-ea-b07; Picard version: 1.123(286a232caea2fdc8fdd88574c09c460b46386fff_1413818736) IntelDeflater INFO 2016-01-13 11:50:19 SinglePassSamProgram Processed 1 000 000 records. Elapsed time: 00:00:08s. Time for last 1 000 000: 7s. Last read position: 1:24 615 160 INFO 2016-01-13 11:50:26 SinglePassSamProgram Processed 2 000 000 records. Elapsed time: 00:00:15s. Time for last 1 000 000: 6s. Last read position: 1:58 405 526 INFO 2016-01-13 11:50:33 SinglePassSamProgram Processed 3 000 000 records. Elapsed time: 00:00:22s. Time for last 1 000 000: 7s. Last read position: 1:84 726 212 INFO 2016-01-13 11:50:39 SinglePassSamProgram Processed 4 000 000 records. Elapsed time: 00:00:28s. Time for last 1 000 000: 6s. Last read position: 1:131 258 270 INFO 2016-01-13 11:50:46 SinglePassSamProgram Processed 5 000 000 records. Elapsed time: 00:00:35s. Time for last 1 000 000: 6s. Last read position: 1:162 699 014 INFO 2016-01-13 11:50:53 SinglePassSamProgram Processed 6 000 000 records. Elapsed time: 00:00:42s. Time for last 1 000 000: 6s. Last read position: 1:180 423 776 INFO 2016-01-13 11:51:00 SinglePassSamProgram Processed 7 000 000 records. Elapsed time: 00:00:49s. Time for last 1 000 000: 7s. Last read position: 10:40 288 148 INFO 2016-01-13 11:51:07 SinglePassSamProgram Processed 8 000 000 records. Elapsed time: 00:00:56s. Time for last 1 000 000: 6s. Last read position: 10:60 300 791 INFO 2016-01-13 11:51:14 SinglePassSamProgram Processed 9 000 000 records. Elapsed time: 00:01:03s. Time for last 1 000 000: 6s. Last read position: 10:78 194 008 INFO 2016-01-13 11:51:21 SinglePassSamProgram Processed 10 000 000 records. Elapsed time: 00:01:10s. Time for last 1 000 000: 7s. Last read position: 10:81 369 411 INFO 2016-01-13 11:51:27 SinglePassSamProgram Processed 11 000 000 records. Elapsed time: 00:01:17s. Time for last 1 000 000: 6s. Last read position: 10:117 277 576 INFO 2016-01-13 11:51:33 SinglePassSamProgram Processed 12 000 000 records. Elapsed time: 00:01:23s. Time for last 1 000 000: 6s. Last read position: 10:117 282 106 INFO 2016-01-13 11:51:40 SinglePassSamProgram Processed 13 000 000 records. Elapsed time: 00:01:30s. Time for last 1 000 000: 6s. Last read position: 10:128 492 616 INFO 2016-01-13 11:51:47 SinglePassSamProgram Processed 14 000 000 records. Elapsed time: 00:01:37s. Time for last 1 000 000: 7s. Last read position: 11:20 077 246 INFO 2016-01-13 11:51:54 SinglePassSamProgram Processed 15 000 000 records. Elapsed time: 00:01:43s. Time for last 1 000 000: 6s. Last read position: 11:50 284 406 INFO 2016-01-13 11:52:00 SinglePassSamProgram Processed 16 000 000 records. Elapsed time: 00:01:49s. Time for last 1 000 000: 6s. Last read position: 11:62 594 925 INFO 2016-01-13 11:52:07 SinglePassSamProgram Processed 17 000 000 records. Elapsed time: 00:01:56s. Time for last 1 000 000: 7s. Last read position: 11:75 576 108 INFO 2016-01-13 11:52:14 SinglePassSamProgram Processed 18 000 000 records. Elapsed time: 00:02:04s. Time for last 1 000 000: 7s. Last read position: 11:86 815 079 INFO 2016-01-13 11:52:21 SinglePassSamProgram Processed 19 000 000 records. Elapsed time: 00:02:10s. Time for last 1 000 000: 6s. Last read position: 11:100 781 869 INFO 2016-01-13 11:52:28 SinglePassSamProgram Processed 20 000 000 records. Elapsed time: 00:02:17s. Time for last 1 000 000: 7s. Last read position: 11:109 668 732 INFO 2016-01-13 11:52:35 SinglePassSamProgram Processed 21 000 000 records. Elapsed time: 00:02:24s. Time for last 1 000 000: 6s. Last read position: 11:120 347 082 INFO 2016-01-13 11:52:42 SinglePassSamProgram Processed 22 000 000 records. Elapsed time: 00:02:31s. Time for last 1 000 000: 6s. Last read position: 12:32 851 067 INFO 2016-01-13 11:52:48 SinglePassSamProgram Processed 23 000 000 records. Elapsed time: 00:02:38s. Time for last 1 000 000: 6s. Last read position: 12:86 883 769 INFO 2016-01-13 11:52:55 SinglePassSamProgram Processed 24 000 000 records. Elapsed time: 00:02:45s. Time for last 1 000 000: 6s. Last read position: 12:113 153 306 INFO 2016-01-13 11:53:03 SinglePassSamProgram Processed 25 000 000 records. Elapsed time: 00:02:52s. Time for last 1 000 000: 7s. Last read position: 13:52 611 052 INFO 2016-01-13 11:53:09 SinglePassSamProgram Processed 26 000 000 records. Elapsed time: 00:02:58s. Time for last 1 000 000: 6s. Last read position: 13:74 720 979 INFO 2016-01-13 11:53:16 SinglePassSamProgram Processed 27 000 000 records. Elapsed time: 00:03:05s. Time for last 1 000 000: 6s. Last read position: 14:14 116 451 INFO 2016-01-13 11:53:23 SinglePassSamProgram Processed 28 000 000 records. Elapsed time: 00:03:12s. Time for last 1 000 000: 6s. Last read position: 14:45 597 948 INFO 2016-01-13 11:53:29 SinglePassSamProgram Processed 29 000 000 records. Elapsed time: 00:03:18s. Time for last 1 000 000: 6s. Last read position: 14:63 521 977 INFO 2016-01-13 11:53:36 SinglePassSamProgram Processed 30 000 000 records. Elapsed time: 00:03:25s. Time for last 1 000 000: 6s. Last read position: 15:5 116 661 INFO 2016-01-13 11:53:43 SinglePassSamProgram Processed 31 000 000 records. Elapsed time: 00:03:32s. Time for last 1 000 000: 6s. Last read position: 15:73 809 694 INFO 2016-01-13 11:53:50 SinglePassSamProgram Processed 32 000 000 records. Elapsed time: 00:03:39s. Time for last 1 000 000: 6s. Last read position: 15:79 534 754 INFO 2016-01-13 11:53:57 SinglePassSamProgram Processed 33 000 000 records. Elapsed time: 00:03:46s. Time for last 1 000 000: 7s. Last read position: 15:98 931 497 INFO 2016-01-13 11:54:04 SinglePassSamProgram Processed 34 000 000 records. Elapsed time: 00:03:53s. Time for last 1 000 000: 6s. Last read position: 16:5 008 851 INFO 2016-01-13 11:54:11 SinglePassSamProgram Processed 35 000 000 records. Elapsed time: 00:04:00s. Time for last 1 000 000: 6s. Last read position: 16:35 022 587 INFO 2016-01-13 11:54:18 SinglePassSamProgram Processed 36 000 000 records. Elapsed time: 00:04:08s. Time for last 1 000 000: 7s. Last read position: 17:6 132 340 INFO 2016-01-13 11:54:25 SinglePassSamProgram Processed 37 000 000 records. Elapsed time: 00:04:14s. Time for last 1 000 000: 6s. Last read position: 17:26 646 052 INFO 2016-01-13 11:54:32 SinglePassSamProgram Processed 38 000 000 records. Elapsed time: 00:04:21s. Time for last 1 000 000: 6s. Last read position: 17:33 952 419 INFO 2016-01-13 11:54:38 SinglePassSamProgram Processed 39 000 000 records. Elapsed time: 00:04:28s. Time for last 1 000 000: 6s. Last read position: 17:39 847 693 INFO 2016-01-13 11:54:46 SinglePassSamProgram Processed 40 000 000 records. Elapsed time: 00:04:35s. Time for last 1 000 000: 7s. Last read position: 17:70 974 194 INFO 2016-01-13 11:54:53 SinglePassSamProgram Processed 41 000 000 records. Elapsed time: 00:04:42s. Time for last 1 000 000: 7s. Last read position: 18:34 349 659 INFO 2016-01-13 11:54:59 SinglePassSamProgram Processed 42 000 000 records. Elapsed time: 00:04:49s. Time for last 1 000 000: 6s. Last read position: 18:65 809 480 INFO 2016-01-13 11:55:06 SinglePassSamProgram Processed 43 000 000 records. Elapsed time: 00:04:56s. Time for last 1 000 000: 6s. Last read position: 19:4 316 395 INFO 2016-01-13 11:55:14 SinglePassSamProgram Processed 44 000 000 records. Elapsed time: 00:05:03s. Time for last 1 000 000: 7s. Last read position: 19:7 215 445 INFO 2016-01-13 11:55:20 SinglePassSamProgram Processed 45 000 000 records. Elapsed time: 00:05:09s. Time for last 1 000 000: 6s. Last read position: 19:21 276 603 INFO 2016-01-13 11:55:27 SinglePassSamProgram Processed 46 000 000 records. Elapsed time: 00:05:16s. Time for last 1 000 000: 6s. Last read position: 19:56 439 778 INFO 2016-01-13 11:55:34 SinglePassSamProgram Processed 47 000 000 records. Elapsed time: 00:05:23s. Time for last 1 000 000: 6s. Last read position: 2:24 405 695 INFO 2016-01-13 11:55:41 SinglePassSamProgram Processed 48 000 000 records. Elapsed time: 00:05:30s. Time for last 1 000 000: 7s. Last read position: 2:34 810 968 INFO 2016-01-13 11:55:48 SinglePassSamProgram Processed 49 000 000 records. Elapsed time: 00:05:37s. Time for last 1 000 000: 6s. Last read position: 2:79 662 361 INFO 2016-01-13 11:55:54 SinglePassSamProgram Processed 50 000 000 records. Elapsed time: 00:05:43s. Time for last 1 000 000: 6s. Last read position: 2:120 721 272 INFO 2016-01-13 11:56:01 SinglePassSamProgram Processed 51 000 000 records. Elapsed time: 00:05:50s. Time for last 1 000 000: 6s. Last read position: 2:148 872 896 INFO 2016-01-13 11:56:08 SinglePassSamProgram Processed 52 000 000 records. Elapsed time: 00:05:57s. Time for last 1 000 000: 6s. Last read position: 2:162 934 652 INFO 2016-01-13 11:56:15 SinglePassSamProgram Processed 53 000 000 records. Elapsed time: 00:06:04s. Time for last 1 000 000: 6s. Last read position: 2:178 035 820 INFO 2016-01-13 11:56:21 SinglePassSamProgram Processed 54 000 000 records. Elapsed time: 00:06:11s. Time for last 1 000 000: 6s. Last read position: 3:65 380 183 INFO 2016-01-13 11:56:28 SinglePassSamProgram Processed 55 000 000 records. Elapsed time: 00:06:18s. Time for last 1 000 000: 6s. Last read position: 3:90 605 083 INFO 2016-01-13 11:56:35 SinglePassSamProgram Processed 56 000 000 records. Elapsed time: 00:06:25s. Time for last 1 000 000: 6s. Last read position: 3:103 044 640 INFO 2016-01-13 11:56:42 SinglePassSamProgram Processed 57 000 000 records. Elapsed time: 00:06:31s. Time for last 1 000 000: 6s. Last read position: 3:142 567 677 INFO 2016-01-13 11:56:49 SinglePassSamProgram Processed 58 000 000 records. Elapsed time: 00:06:38s. Time for last 1 000 000: 7s. Last read position: 4:43 549 937 INFO 2016-01-13 11:56:56 SinglePassSamProgram Processed 59 000 000 records. Elapsed time: 00:06:45s. Time for last 1 000 000: 6s. Last read position: 4:103 564 637 INFO 2016-01-13 11:57:02 SinglePassSamProgram Processed 60 000 000 records. Elapsed time: 00:06:51s. Time for last 1 000 000: 6s. Last read position: 4:126 232 636 INFO 2016-01-13 11:57:09 SinglePassSamProgram Processed 61 000 000 records. Elapsed time: 00:06:58s. Time for last 1 000 000: 6s. Last read position: 4:134 161 336 INFO 2016-01-13 11:57:16 SinglePassSamProgram Processed 62 000 000 records. Elapsed time: 00:07:05s. Time for last 1 000 000: 6s. Last read position: 4:145 213 984 INFO 2016-01-13 11:57:22 SinglePassSamProgram Processed 63 000 000 records. Elapsed time: 00:07:12s. Time for last 1 000 000: 6s. Last read position: 5:5 783 143 INFO 2016-01-13 11:57:29 SinglePassSamProgram Processed 64 000 000 records. Elapsed time: 00:07:18s. Time for last 1 000 000: 6s. Last read position: 5:44 176 835 INFO 2016-01-13 11:57:36 SinglePassSamProgram Processed 65 000 000 records. Elapsed time: 00:07:25s. Time for last 1 000 000: 6s. Last read position: 5:107 903 709 INFO 2016-01-13 11:57:43 SinglePassSamProgram Processed 66 000 000 records. Elapsed time: 00:07:32s. Time for last 1 000 000: 6s. Last read position: 5:121 372 232 INFO 2016-01-13 11:57:49 SinglePassSamProgram Processed 67 000 000 records. Elapsed time: 00:07:38s. Time for last 1 000 000: 5s. Last read position: 5:134 620 184 INFO 2016-01-13 11:57:56 SinglePassSamProgram Processed 68 000 000 records. Elapsed time: 00:07:45s. Time for last 1 000 000: 7s. Last read position: 5:142 903 802 INFO 2016-01-13 11:58:03 SinglePassSamProgram Processed 69 000 000 records. Elapsed time: 00:07:52s. Time for last 1 000 000: 7s. Last read position: 5:151 643 744 INFO 2016-01-13 11:58:10 SinglePassSamProgram Processed 70 000 000 records. Elapsed time: 00:07:59s. Time for last 1 000 000: 7s. Last read position: 6:55 060 998 INFO 2016-01-13 11:58:17 SinglePassSamProgram Processed 71 000 000 records. Elapsed time: 00:08:06s. Time for last 1 000 000: 6s. Last read position: 6:97 221 731 INFO 2016-01-13 11:58:24 SinglePassSamProgram Processed 72 000 000 records. Elapsed time: 00:08:13s. Time for last 1 000 000: 6s. Last read position: 6:124 728 807 INFO 2016-01-13 11:58:31 SinglePassSamProgram Processed 73 000 000 records. Elapsed time: 00:08:20s. Time for last 1 000 000: 7s. Last read position: 7:3 650 845 INFO 2016-01-13 11:58:38 SinglePassSamProgram Processed 74 000 000 records. Elapsed time: 00:08:27s. Time for last 1 000 000: 6s. Last read position: 7:19 572 846 INFO 2016-01-13 11:58:44 SinglePassSamProgram Processed 75 000 000 records. Elapsed time: 00:08:34s. Time for last 1 000 000: 6s. Last read position: 7:29 149 560 INFO 2016-01-13 11:58:51 SinglePassSamProgram Processed 76 000 000 records. Elapsed time: 00:08:41s. Time for last 1 000 000: 6s. Last read position: 7:45 458 060 INFO 2016-01-13 11:58:58 SinglePassSamProgram Processed 77 000 000 records. Elapsed time: 00:08:47s. Time for last 1 000 000: 6s. Last read position: 7:80 735 551 INFO 2016-01-13 11:59:05 SinglePassSamProgram Processed 78 000 000 records. Elapsed time: 00:08:54s. Time for last 1 000 000: 6s. Last read position: 7:104 465 010 INFO 2016-01-13 11:59:12 SinglePassSamProgram Processed 79 000 000 records. Elapsed time: 00:09:01s. Time for last 1 000 000: 6s. Last read position: 7:126 699 913 INFO 2016-01-13 11:59:19 SinglePassSamProgram Processed 80 000 000 records. Elapsed time: 00:09:08s. Time for last 1 000 000: 7s. Last read position: 7:135 684 619 INFO 2016-01-13 11:59:26 SinglePassSamProgram Processed 81 000 000 records. Elapsed time: 00:09:15s. Time for last 1 000 000: 7s. Last read position: 8:13 587 009 INFO 2016-01-13 11:59:32 SinglePassSamProgram Processed 82 000 000 records. Elapsed time: 00:09:22s. Time for last 1 000 000: 6s. Last read position: 8:70 029 074 INFO 2016-01-13 11:59:39 SinglePassSamProgram Processed 83 000 000 records. Elapsed time: 00:09:29s. Time for last 1 000 000: 6s. Last read position: 8:83 725 864 INFO 2016-01-13 11:59:46 SinglePassSamProgram Processed 84 000 000 records. Elapsed time: 00:09:35s. Time for last 1 000 000: 6s. Last read position: 8:105 525 578 INFO 2016-01-13 11:59:53 SinglePassSamProgram Processed 85 000 000 records. Elapsed time: 00:09:43s. Time for last 1 000 000: 7s. Last read position: 8:122 699 229 INFO 2016-01-13 12:00:00 SinglePassSamProgram Processed 86 000 000 records. Elapsed time: 00:09:50s. Time for last 1 000 000: 7s. Last read position: 9:35 226 343 INFO 2016-01-13 12:00:07 SinglePassSamProgram Processed 87 000 000 records. Elapsed time: 00:09:56s. Time for last 1 000 000: 6s. Last read position: 9:57 627 423 INFO 2016-01-13 12:00:14 SinglePassSamProgram Processed 88 000 000 records. Elapsed time: 00:10:03s. Time for last 1 000 000: 6s. Last read position: 9:70 003 456 INFO 2016-01-13 12:00:20 SinglePassSamProgram Processed 89 000 000 records. Elapsed time: 00:10:10s. Time for last 1 000 000: 6s. Last read position: 9:96 896 206 INFO 2016-01-13 12:00:27 SinglePassSamProgram Processed 90 000 000 records. Elapsed time: 00:10:17s. Time for last 1 000 000: 7s. Last read position: 9:114 747 811 INFO 2016-01-13 12:00:34 SinglePassSamProgram Processed 91 000 000 records. Elapsed time: 00:10:24s. Time for last 1 000 000: 6s. Last read position: MT:5 758 INFO 2016-01-13 12:00:38 SinglePassSamProgram Processed 92 000 000 records. Elapsed time: 00:10:27s. Time for last 1 000 000: 3s. Last read position: MT:8 095 INFO 2016-01-13 12:00:43 SinglePassSamProgram Processed 93 000 000 records. Elapsed time: 00:10:32s. Time for last 1 000 000: 5s. Last read position: X:7 634 529 INFO 2016-01-13 12:00:50 SinglePassSamProgram Processed 94 000 000 records. Elapsed time: 00:10:39s. Time for last 1 000 000: 6s. Last read position: X:73 896 041 INFO 2016-01-13 12:00:57 SinglePassSamProgram Processed 95 000 000 records. Elapsed time: 00:10:46s. Time for last 1 000 000: 6s. Last read position: X:101 683 503 INFO 2016-01-13 12:01:04 SinglePassSamProgram Processed 96 000 000 records. Elapsed time: 00:10:53s. Time for last 1 000 000: 6s. Last read position: X:167 207 520 INFO 2016-01-13 12:01:13 SinglePassSamProgram Processed 97 000 000 records. Elapsed time: 00:11:02s. Time for last 1 000 000: 9s. Last read position: */* INFO 2016-01-13 12:01:24 SinglePassSamProgram Processed 98 000 000 records. Elapsed time: 00:11:13s. Time for last 1 000 000: 11s. Last read position: */* INFO 2016-01-13 12:01:35 SinglePassSamProgram Processed 99 000 000 records. Elapsed time: 00:11:24s. Time for last 1 000 000: 11s. Last read position: */* INFO 2016-01-13 12:01:37 RExecutor Executing R script via command: Rscript /scratch/script9078635170208800759.R metrics/ID.base_distribution_by_cycle_metrics metrics/ID.base_distribution_by_cycle.pdf sorted.mdup.bam ERROR 2016-01-13 12:01:39 ProcessExecutor Error in FUN(X[[i]], ...) : ERROR 2016-01-13 12:01:39 ProcessExecutor only defined on a data frame with all numeric variables ERROR 2016-01-13 12:01:39 ProcessExecutor Calls: plot ... plot.default -> xy.coords -> Summary.data.frame -> lapply -> FUN ERROR 2016-01-13 12:01:39 ProcessExecutor Exécution arrêtée [Wed Jan 13 12:01:39 EST 2016] picard.analysis.CollectMultipleMetrics done. Elapsed time: 11,48 minutes. Runtime.totalMemory()=761266176 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" picard.PicardException: R script nucleotideDistributionByCycle.R failed with return code 1 at picard.analysis.CollectBaseDistributionByCycle.finish(CollectBaseDistributionByCycle.java:80) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:136) at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:144) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185) at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:125) at picard.analysis.CollectMultipleMetrics.main(CollectMultipleMetrics.java:108) Created 2016-01-11 14:47:05 | Updated | Tags: vcf r vcf-filtering Hi, I'm trying to finalise good hard-filtering parameters. Does anyone know why the quality-by-depth distribution having has two peaks (See attached graphs, last column"QD"). This happens even after very strict filtering by basic metrics. There seems to be a lot of variants contributing to the two peaks so I'm guessing it's not due to a particular genomic region. (The graph lines are Drosophila chromosomes. Chr4 in blue is clearly poor. The Inbreeding coefficient and allele frequency are expected to be weird-looking due to our breeding design.) Filter parameters are (MQ > 61, MQ < 68, FS < 5, AN > 420, InbreedingCoeff > -1, DP < 10000, DP > 1000, ReadPosRankSum > -1, ReadPosRankSum < 1, ClippingRankSum > -.5, ClippingRankSum < .5, BaseQRankSum > -1, BaseQRankSum < 1, MQRankSum > -.5, MQRankSum < .5, EVENTLENGTH < 1, EVENTLENGTH > -1). Cheers, Will Created 2015-08-27 08:11:30 | Updated | Tags: analyzecovariates r warnings Hi, When running the BQSR script, I get the following warnings Warning messages: 1: NAs introduced by coercion 2: NAs introduced by coercion I have managed to track down exactly where they come from: for(cov in levels(data$CovariateName)) {
d = data[data$CovariateName==cov,] if( cov == "Context" ) { d$CovariateValue = as.character(d$CovariateValue) d$CovariateValue = substring(d$CovariateValue,nchar(d$CovariateValue)-2,nchar(d$CovariateValue)) } else { d$CovariateValue = as.numeric(levels(d$CovariateValue))[as.integer(d$CovariateValue)]
}

Here the problem is that levels(d\$CovariateValue) contains both integers and strings (short DNA sequences), and the latter causes as.numeric to introduce NAs.

Is this something to be worried about? I am using GATK 3.4-46, but the error also occurs in 3.3-0.

Thanks, Michael Knudsen

Created 2015-07-01 06:50:57 | Updated | Tags: r

I am trying to implement ABC using R .ABCoptim package is available along with R, can we use this for clustering?how ? please show me using an example

Created 2015-01-27 20:07:05 | Updated 2015-01-27 20:09:02 | Tags: varianteval r

VariantEval follows a strict format that is human readable but also the top few lines of each table more than hint that it's ready to be parsed by something else. Been wondering what that is, looks like Python... Is there a nice quick way to load all of these data into something like R or Python? I can imagine using grep to put each table into a file and load that into R but if the output was designed for a certain tool, it would be great to use that.

Thanks as always for the wonderful tools!

Example, what language or tool is this stuff for?

#:GATKTable:11:12:%s:%s:%s:%s:%s:%d:%d:%d:%.2f:%d:%.2f:;

#:GATKTable:CompOverlap:The overlap between eval and comp sites

Created 2014-09-25 09:47:26 | Updated | Tags: info r leftalignandtrimvariants vcf4-2 number

I have attached two VCF files generated with samtools (pass.vcf and fail.vcf). One of them (fail.vcf) contains this line:

##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">

When I run LeftAlignAndTrimVariants3.2 on the v4.2 VCF file containing the INFO line above, then I get this error:

##### ERROR MESSAGE: For input string: "R"

The line is perfectly valid according to the VCF4.2 (and 4.3) specifications:

"The Number entry is an Integer that describes the number of values that can be included with the INFO field." "If the field has one value for each possible allele (including the reference), then this value should be ‘R’."

It's an easy issue to handle, but it would be great, if you could eventually fix this low priority bug. Thanks!

I haven't attached the two small vcf files. "Uploaded file type is not allowed." But zip files are. Files attached.

Created 2014-07-03 14:29:27 | Updated | Tags: vqsr r

java -jar -Djava.io.tmpdir=temp/ -Xmx4g GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar -T VariantRecalibrator -R hg19.fa -input NA19240.raw.SNPs.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.refmt.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_138.b37.refmt.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an DP -mode SNP -recalFile NA19240.raw.SNPs.recal -tranchesFile NA19240.raw.SNPs.tranches -rscriptFile NA19240.snp.plots.R

However, there is no NA19240.snp.plots.R.pdf generated. And I didn't find any error. When I try to run NA19240.snp.plots.R in R, source('NA19240.snp.plots.R'), there is an error: Error: Use 'theme' instead. (Defunct; last used in version 0.9.1)

How can I fix it? Thanks!!

Created 2014-01-05 12:06:13 | Updated 2014-01-05 12:07:05 | Tags: analyzecovariates r

I am running GATK in clusters via pbs scheduling, and found "AnalyzeCovariates" could not use customized Rscript path.

All nodes have CentOS installed, R is already installed and could be found under "/usr/bin/R" from "which R". Unfortunately, R version is not identical among nodes, i.e., some nodes have R 2.15, and some have R 3.0 installed.

I installed the latest R version under my home folder, and add following commands to .bash_profile and .bash_rc:

if [ lsb_release -i|cut -c17-20 == 'Cent' ] ; then alias R='/home/XXX/R-3.0.2/bin/R' alias Rscript='/home/XXX/R-3.0.2/bin/Rscript' fi

If I login to the cluster via qsub -I, and type R in the console, customized R will be invoked, and this is also shown in "which R" :

alias R='/home/XXX/R-3.0.2/bin/R' ~/R-3.0.2/bin/R

All GATK required packages have been installed.

However, when I run AnalyzeCovariates, it reported that some packages are missing, and it turns out that AnalyzeCovariates is using the R under "/usr/bin/R". So how to make AnalyzeCovariates use the right R? Do I miss something in the bash configure files?

Thanks.