Tagged with #picard
5 documentation articles | 2 announcements | 16 forum discussions


Comments (43)

Objective

Install all software packages required to follow the GATK Best Practices.

Prerequisites

To follow these instructions, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your systems administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.

  • Basic Unix environment commands
  • Binary / Executable
  • Compiling a binary
  • Adding a binary to your path
  • Command-line shell, terminal or console
  • Software library

You will also need to have access to an ANSI compliant C++ compiler and the tools needed for normal compilations (make, shell, the standard library, tar, gunzip). These tools are usually pre-installed on Linux/Unix systems. On MacOS X, you may need to install the MacOS Xcode tools. See https://developer.apple.com/xcode/ for relevant information and software downloads. The XCode tools are free but an AppleID may be required to download them.

Starting with version 2.6, the GATK requires Java Runtime Environment version 1.7. All Linux/Unix and MacOS X systems should have a JRE pre-installed, but the version may vary. To test your Java version, run the following command in the shell:

java -version 

This should return a message along the lines of ”java version 1.7.0_25” as well as some details on the Runtime Environment (JRE) and Virtual Machine (VM). If you have a version other than 1.7.x, be aware that you may run into trouble with some of the more advanced features of the Picard and GATK tools. The simplest solution is to install an additional JRE and specify which you want to use at the command-line. To find out how to do so, you should seek help from your systems administrator.

Software packages

  1. BWA
  2. SAMtools
  3. Picard
  4. Genome Analysis Toolkit (GATK)
  5. IGV
  6. RStudio IDE and R libraries ggplot2 and gsalib

Note that the version numbers of packages you download may be different than shown in the instructions below. If so, please adapt the number accordingly in the commands.


1. BWA

Read the overview of the BWA software on the BWA project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xvzf bwa-0.7.12.tar.bz2 

This will produce a directory called bwa-0.7.12 containing the files necessary to compile the BWA binary. Move to this directory and compile using:

cd bwa-0.7.12
make

The compiled binary is called bwa. You should find it within the same folder (bwa-0.7.12 in this example). You may also find other compiled binaries; at time of writing, a second binary called bwamem-lite is also included. You can disregard this file for now. Finally, just add the BWA binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

bwa 

This should print out some version and author information as well as a list of commands. As the Usage line states, to use BWA you will always build your command lines like this:

bwa <command> [options] 

This means you first make the call to the binary (bwa), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command.


2. SAMtools

Read the overview of the SAMtools software on the SAMtools project homepage, then download the latest version of the software package.

  • Installation

Unpack the tar file using:

tar xvzf samtools-0.1.2.tar.bz2 

This will produce a directory called samtools-0.1.2 containing the files necessary to compile the SAMtools binary. Move to this directory and compile using:

cd samtools-0.1.2 
make 

The compiled binary is called samtools. You should find it within the same folder (samtools-0.1.2 in this example). Finally, add the SAMtools binary to your path to make it available on the command line. This completes the installation process.

  • Testing

Open a shell and run:

samtools 

This should print out some version information as well as a list of commands. As the Usage line states, to use SAMtools you will always build your command lines like this:

samtools <command> [options] 

This means you first make the call to the binary (samtools), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is a similar convention as used by BWA.


3. Picard

Read the overview of the Picard software on the Picard project homepage, then download the latest version of the software package.

  • Installation

Unpack the zip file using:

tar xjf picard-tools-1.130.zip 

This will produce a directory called picard-tools-1.130 containing the Picard jar files. Picard tools are distributed as a pre-compiled Java executable (jar file) so there is no need to compile them.

Note that it is not possible to add jar files to your path to make the tools available on the command line; you have to specify the full path to the jar file in your java command, which would look like this:

java -jar ~/my_tools/jars/picard.jar <Toolname> [options]

This syntax will be explained in a little more detail further below.

However, you can set up a shortcut called an "environment variable" in your shell profile configuration to make this easier. The idea is that you create a variable that tells your system where to find a given jar, like this:

PICARD = "~/my_tools/jars/picard.jar"

So then when you want to run a Picard tool, you just need to call the jar by its shortcut, like this:

java -jar $PICARD <Toolname> [options]

The exact way to set this up depends on what shell you're using and how your environment is configured. We like this overview and tutorial which explains how it all works; but if you are new to the command line environment and you find this too much too deal with, we recommend asking for help from your institution's IT support group.

This completes the installation process.

  • Testing

Open a shell and run:

java -jar picard.jar -h 

This should print out some version and usage information about the AddOrReplaceReadGroups.jar tool. At this point you will have noticed an important difference between BWA and Picard tools. To use BWA, we called on the BWA program and specified which of its internal tools we wanted to apply. To use Picard, we called on Java itself as the main program, then specified which jar file to use, knowing that one jar file = one tool. This applies to all Picard tools; to use them you will always build your command lines like this:

java -jar picard.jar <ToolName> [options] 

This means you first make the call to Java itself as the main program, then specify the picard.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis.

Note that the command-line syntax of Picard tools has recently changed from java -jar <ToolName>.jar to java -jar picard.jar <ToolName>. We are using the newer syntax in this document, but some of our other documents may not have been updated yet. If you encounter any documents using the old syntax, let us know and we'll update them accordingly. If you are already using an older version of Picard, either adapt the commands or better, upgrade your version!

Next we will see that GATK tools are called in essentially the same way, although the way the options are specified is a little different. The reasons for how tools in a given software package are organized and invoked are largely due to the preferences of the software developers. They generally do not reflect strict technical requirements, although they can have an effect on speed and efficiency.


4. Genome Analysis Toolkit (GATK)

Hopefully if you're reading this, you're already acquainted with the purpose of the GATK, so go ahead and download the latest version of the software package.

In order to access the downloads, you need to register for a free account on the GATK support forum. You will also need to read and accept the license agreement before downloading the GATK software package. Note that if you intend to use the GATK for commercial purposes, you will need to purchase a license. See the licensing page for an overview of the commercial licensing conditions.

  • Installation

Unpack the tar file using:

tar xjf GenomeAnalysisTK-3.3-0.tar.bz2 

This will produce a directory called GenomeAnalysisTK-3.3-0 containing the GATK jar file, which is called GenomeAnalysisTK.jar, as well as a directory of example files called resources. GATK tools are distributed as a single pre-compiled Java executable so there is no need to compile them. Just like we discussed for Picard, it's not possible to add the GATK to your path, but you can set up a shortcut to the jar file using environment variables as described above.

This completes the installation process.

  • Testing

Open a shell and run:

java -jar GenomeAnalysisTK.jar -h 

This should print out some version and usage information, as well as a list of the tools included in the GATK. As the Usage line states, to use GATK you will always build your command lines like this:

java -jar GenomeAnalysisTK.jar -T <ToolName> [arguments] 

This means that just like for Picard, you first make the call to Java itself as the main program, then specify the GenomeAnalysisTK.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis.


5. IGV

The Integrated Genomics Viewer is a genome browser that allows you to view BAM, VCF and other genomic file information in context. It has a graphical user interface that is very easy to use, and can be downloaded for free (though registration is required) from this website. We encourage you to read through IGV's very helpful user guide, which includes many detailed tutorials that will help you use the program most effectively.


6. RStudio IDE and R libraries ggplot2 and gsalib

Download the latest version of RStudio IDE. The webpage should automatically detect what platform you are running on and recommend the version most suitable for your system.

  • Installation

Follow the installation instructions provided. Binaries are provided for all major platforms; typically they just need to be placed in your Applications (or Programs) directory. Open RStudio and type the following command in the console window:

install.packages("ggplot2") 

This will download and install the ggplot2 library as well as any other library packages that ggplot2 depends on for its operation. Note that some users have reported having to install two additional package themselves, called reshape and gplots, which you can do as follows:

install.packages("reshape")
install.packages("gplots")

Finally, do the same thing to install the gsalib library:

install.packages("gsalib")

This will download and install the gsalib library.

Important note

If you are using a recent version of ggplot2 and a version of GATK older than 3.2, you may encounter an error when trying to generate the BQSR or VQSR recalibration plots. This is because until recently our scripts were still using an older version of certain ggplot2 functions. This has been fixed in GATK 3.2, so you should either upgrade your version of GATK (recommended) or downgrade your version of ggplot2. If you experience further issues generating the BQSR recalibration plots, please see this tutorial.

Comments (14)

Objective

Prepare a reference sequence so that it is suitable for use with BWA and GATK.

Prerequisites

  • Installed BWA
  • Installed SAMTools
  • Installed Picard

Steps

  1. Generate the BWA index
  2. Generate the Fasta file index
  3. Generate the sequence dictionary

1. Generate the BWA index

Action

Run the following BWA command:

bwa index -a bwtsw reference.fa 

where -a bwtsw specifies that we want to use the indexing algorithm that is capable of handling the whole human genome.

Expected Result

This creates a collection of files used by BWA to perform the alignment.


2. Generate the fasta file index

Action

Run the following SAMtools command:

samtools faidx reference.fa 

Expected Result

This creates a file called reference.fa.fai, with one record per line for each of the contigs in the FASTA reference file. Each record is composed of the contig name, size, location, basesPerLine and bytesPerLine.


3. Generate the sequence dictionary

Action

Run the following Picard command:

java -jar picard.jar CreateSequenceDictionary \
    REFERENCE=reference.fa \ 
    OUTPUT=reference.dict 

Note that this is the new syntax for use with the latest version of Picard. Older versions used a slightly different syntax because all the tools were in separate jars, so you'd call e.g. java -jar CreateSequenceDictionary.jar directly.

Expected Result

This creates a file called reference.dict formatted like a SAM header, describing the contents of your reference FASTA file.

Comments (1)

The picard repository on github contains all picard public tools. Libraries live under the htsjdk, which includes the samtools-jdk, tribble, and variant packages (which includes VariantContext and associated classes as well as the VCF/BCF codecs).

If you just need to check out the sources and don't need to make any commits into the picard repository, the command is:

git clone https://github.com/broadinstitute/picard.git

Then within the picard directory, clone the htsjdk.

cd picard
git clone https://github.com/samtools/htsjdk.git

Then you can attach the picard/src/java and picard/htsjdk/src/java directories in IntelliJ as a source directory (File -> Project Structure -> Libraries -> Click the plus sign -> "Attach Files or Directories" in the latest IntelliJ).

To build picard and the htsjdk all at once, type ant from within the picard directory. To run tests, type ant test

If you do need to make commits into the picard repository, first you'll need to create a github account, fork picard or htsjdk, make your changes, and then issue a pull request. For more info on pull requests, see: https://help.github.com/articles/using-pull-requests

Comments (6)

1. Overview

The Tribble project was started as an effort to overhaul our reference-ordered data system; we had many different formats that were shoehorned into a common framework that didn't really work as intended. What we wanted was a common framework that allowed for searching of reference ordered data, regardless of the underlying type. Jim Robinson had developed indexing schemes for text-based files, which was incorporated into the Tribble library.

2. Architecture Overview

Tribble provides a lightweight interface and API for querying features and creating indexes from feature files, while allowing iteration over know feature files that we're unable to create indexes for. The main entry point for external users is the BasicFeatureReader class. It takes in a codec, an index file, and a file containing the features to be processed. With an instance of a BasicFeatureReader, you can query for features that span a specific location, or get an iterator over all the records in the file.

3. Developer Overview

For developers, there are two important classes to implement: the FeatureCodec, which decodes lines of text and produces features, and the feature class, which is your underlying record type.

For developers there are two classes that are important:

  • Feature

    This is the genomicly oriented feature that represents the underlying data in the input file. For instance in the VCF format, this is the variant call including quality information, the reference base, and the alternate base. The required information to implement a feature is the chromosome name, the start position (one based), and the stop position. The start and stop position represent a closed, one-based interval. I.e. the first base in chromosome one would be chr1:1-1.

  • FeatureCodec

    This class takes in a line of text (from an input source, whether it's a file, compressed file, or a http link), and produces the above feature.

To implement your new format into Tribble, you need to implement the two above classes (in an appropriately named subfolder in the Tribble check-out). The Feature object should know nothing about the file representation; it should represent the data as an in-memory object. The interface for a feature looks like:

public interface Feature {

    /**
     * Return the features reference sequence name, e.g chromosome or contig
     */
    public String getChr();

    /**
     * Return the start position in 1-based coordinates (first base is 1)
     */
    public int getStart();

    /**
     * Return the end position following 1-based fully closed conventions.  The length of a feature is
     * end - start + 1;
     */
    public int getEnd();
}

And the interface for FeatureCodec:

/**
 * the base interface for classes that read in features.
 * @param <T> The feature type this codec reads
 */
public interface FeatureCodec<T extends Feature> {
    /**
     * Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.
     *
     * @param line the input line to decode
     * @return  Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is
     * a comment)
     */
    public Feature decodeLoc(String line);

    /**
     * Decode a line as a Feature.
     *
     * @param line the input line to decode
     * @return  Return the Feature encoded by the line,  or null if the line does not represent a feature (e.g. is
     * a comment)
     */
    public T decode(String line);

    /**
     * This function returns the object the codec generates.  This is allowed to be Feature in the case where
     * conditionally different types are generated.  Be as specific as you can though.
     *
     * This function is used by reflections based tools, so we can know the underlying type
     *
     * @return the feature type this codec generates.
     */
    public Class<T> getFeatureType();


    /**  Read and return the header, or null if there is no header.
     *
     * @return header object
     */
    public Object readHeader(LineReader reader);
}

4. Supported Formats

The following formats are supported in Tribble:

  • VCF Format
  • DbSNP Format
  • BED Format
  • GATK Interval Format

5. Updating the Tribble, htsjdk, and/or Picard library

Updating the revision of Tribble on the system is a relatively straightforward task if the following steps are taken.

NOTE: Any directory starting with ~ may be different on your machine, depending on where you cloned the various repositories for gsa-unstable, picard, and htsjdk.

A Maven script to install picard into the local repository is located under gsa-unstable/private/picard-maven. To operate, it requires a symbolic link named picard pointing to a working checkout of the picard github repository. NOTE: compiling picard requires an htsjdk github repository checkout available at picard/htsjdk, either as a subdirectory or another symbolic link. The final full path should be gsa-unstable/private/picard-maven/picard/htsjdk.

cd ~/src/gsa-unstable
cd private/picard-maven
ln -s ~/src/picard picard

Create a git branch of Picard and/or htsjdk and make your changes. To install your changes into the GATK you must run mvn install in the private/picard-maven directory. This will compile and copy the jars into gsa-unstable/public/repo, and update gsa-unstable/gatk-root/pom.xml with the corresponding version. While making changes your revision of picard and htslib will be labeled with -SNAPSHOT.

cd ~/src/gsa-unstable
cd private/picard-maven
mvn install

Continue testing in the GATK. Once your changes and updated tests for picard/htsjdk are complete, push your branch and submit your pull request to the Picard and/or htsjdk github. After your Picard/htsjdk patches are accepted, switch your Picard/htsjdk branches back to the master branch. NOTE: Leave your gsa-unstable branch on your development branch!

cd ~/src/picard
ant clean
git checkout master
git fetch
git rebase
cd htsjdk
git checkout master
git fetch
git rebase

NOTE: The version number of old and new Picard/htsjdk will vary, and during active development will end with -SNAPSHOT. While, if needed, you may push -SNAPSHOT version for testing on Bamboo, you should NOT submit a pull request with a -SNAPSHOT version. -SNAPSHOT indicates your local changes are not reproducible from source control.

When ready, run mvn install once more to create the non -SNAPSHOT versions under gsa-unstable/public/repo. In that directory, git add the new version, and git rm the old versions.

cd ~/src/gsa-unstable
cd public/repo
git add picard/picard/1.115.1499/
git add samtools/htsjdk/1.115.1509/
git rm -r picard/picard/1.112.1452/
git rm -r samtools/htsjdk/1.112.1452/

Commit and then push your gsa-unstable branch, then issue a pull request for review.

Comments (17)

ReorderSam

The GATK can be particular about the ordering of a BAM file. If you find yourself in the not uncommon situation of having created or received BAM files sorted in a bad order, you can use the tool ReorderSam to generate a new BAM file where the reads have been reordered to match a well-ordered reference file.

java -jar picard/ReorderSam.jar I= lexicographc.bam O= kayrotypic.bam REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta

This tool requires you have a correctly sorted version of the reference sequence you used to align your reads. This tool will drop reads that don't have equivalent contigs in the new reference (potentially bad, but maybe not). If contigs have the same name in the bam and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a lift over tool!

The tool, though once in the GATK, is now part of the Picard package.

Comments (2)

Here's some good news for anyone who has been using both GATK and the Picard tools in their work -- which means all of you, since you all follow our Best Practices to a tee, right?

As you may know, both toolkits are developed here at the Broad Institute, and are deployed together in the Broad's analysis pipelines. The fact that they have been developed, released and supported separately so far is more an accident of history and internal organization than anything else (and we know it's inconvenient to y'all).

The good news is that we're taking steps to consolidate these efforts, which we believe will benefit everyone. In that spirit, we have been working closely with the Picard tools development team, and we're now ready to take the first step of consolidating support for the tools. From now on, you will be able to ask us questions about the Picard tools, and report bugs, in the GATK forum. And developers will be happy to hear that we are also committed to supporting HTSJDK for developers through the Github repo’s Issues tracker.

In the near future, we will also start hosting downloads and documentation for the Picard tools on the GATK website. And before you ask, yes, the Picard tools will continue to be completely open-source and available to all free of charge.

To recap, we have brought the GATK and Picard teams together, and we are working on bringing together in the same place all the methods and tools to perform genome analysis. Our goal is to make a world where you can run our complete Best Practices pipeline end-to-end with a single Broad toolkit. We think it’ll make your life easier, because it sure is making ours easier.

Comments (0)

Consider this a public service announcement, since most GATK users probably also use Picard tools routinely. The recently released version 1.124 of the Picard tools includes many lovely improvements, bug fixes and even a few new tools (see release notes for full details) -- but you'll want to pay attention to one major change in particular.

From this version onward, the Picard release will contain a single JAR file containing all the tools, instead of one JAR file per tool as it was before. This means that when you invoke a Picar tool, you'll invoke a single JAR, then specify which tool (which they call CLP for Command Line Program) you want to run. This should feel completely familiar if you already use GATK regularly, but it does mean you'll need to update any scripts that use Picard tools to the new paradigm. Other than that, there's no change of syntax; Picard will still use e.g. I=input.bam where GATK would use -I input.bam.

We will need to update some of our own documentation accordingly over the near future; please bear with us as we go through this process, and let us know by commenting in this thread if you find any docs that have yet to be updated.

Comments (4)

I have 16 libraries that were pooled and sequenced on the same lane. When using AddOrReplaceReadGroups.jar, should I create a unique RGLB name for each library (e.g., lib1, lib2, lib3, etc.)? Apologies if this sounds like a silly question, but I have seen folks on seq-answers use the same name (e.g., lib1) for the RGLB flag when libraries were sequenced on the same lane.

Thanks!

Comments (2)

Hello,

I was asked to re-post this question here. It was originally posted in the Picard forum at GitHub at https://github.com/broadinstitute/picard/issues/161.

Regards,

Bernt


ORIGINAL POST (edited)

There seems to be a problems with FixMateInformation crashing with

Exception in thread "main" java.lang.``````NullPointerException``````
at htsjdk.samtools.SamPairUtil.setMateInformationOnSupplementalAlignment(SamPairUtil.java:300)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.advance(SamPairUtil.java:442)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.next(SamPairUtil.java:454)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.next(SamPairUtil.java:360)
at picard.sam.FixMateInformation.doWork(FixMateInformation.java:194)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185)
at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:125)
at picard.sam.FixMateInformation.main(FixMateInformation.java:93)

The problem first appeared in version 1.121. It is present in version 1.128. Versions prior to 1.120 worked and continue to work fine. I am currently using Java 1.7.0_75, but I observed the same problem with earlier version of Java. The problem occurs under several different version of Fedora.

The command lines I am using are:

java -jar picard-1.128/picard.jar FixMateInformation INPUT=test.bam OUTPUT=fixed.bam (fails)

java -jar picard-1.121/FixMateInformation.jar INPUT=test.bam OUTPUT=fixed.bam (fails)

java -jar picard-1.120/FixMateInformation.jar INPUT=test.bam OUTPUT=fixed.bam (succeeds)

I have observed the problem with various BAM files. This one is (a small subset of) the output of an indel realignment with GATK.


Later in the same thread:

ValidateSam produces: java -jar /opt/ghpc/picard-1.121/ValidateSamFile.jar INPUT=test.bam OUTPUT=out.bam [Wed Feb 18 08:48:40 CET 2015] picard.sam.ValidateSamFile INPUT=test.bam OUTPUT=out.bam MODE=VERBOSE MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Feb 18 08:48:40 CET 2015] Executing as bernt@interactive.ghpc.dk on Linux 2.6.35.14-106.fc14.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_75-b13; Picard version: 1.121(da291b4d265f877808b216dce96eaeffd2f30bd3_1411396652) IntelDeflater [Wed Feb 18 08:48:41 CET 2015] picard.sam.ValidateSamFile done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=505937920 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp


And later:

The problem also exists in the new version, 1.129.


New

Re-posted in GATK forum

Picard (1.129)'s ValidateSamFile complains about Mate not found - which was the reason for running FixMateInformation in the first place.

The output is:

java -jar /opt/ghpc/picard-current/picard.jar ValidateSamFile I=test.bam

[Thu Mar 26 16:44:49 CET 2015] picard.sam.ValidateSamFile INPUT=test.bam    MODE=VERBOSE MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Mar 26 16:44:49 CET 2015] Executing as bernt@interactive.ghpc.dk on Linux 2.6.35.14-106.fc14.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_75-b13; Picard version: 1.129(b508b2885562a4e932d3a3a60b8ea283b7ec78e2_1424706677) IntelDeflater
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2215:5439:78978, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1216:7853:25411, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2301:9078:52020, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2104:18417:29553, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2310:18752:24451, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2310:6551:24766, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1105:9672:78339, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2112:20003:44801, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2213:8473:74864, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2306:11852:94726, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1110:11106:17369, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2215:12401:47072, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2312:13964:14859, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1312:3886:41184, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1206:12827:34659, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1107:18908:98983, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1313:7640:45146, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1306:1595:15034, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2209:2036:47281, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1201:6826:100382, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2213:4861:63517, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2204:10202:63100, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1207:7125:93640, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1101:9691:36089, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2211:1839:100174, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2312:7331:16518, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2303:13396:44533, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1103:15274:86897, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2110:1541:39614, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1206:10320:20874, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2104:12084:25830, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2115:6231:35664, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1106:5365:6728, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1201:5887:87680, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1204:9449:99890, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2207:6920:91927, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1113:17505:78862, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2311:19423:17546, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2303:6787:39570, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1116:6350:25293, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1305:15016:58323, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1116:10894:97830, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2306:13179:38191, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1301:11303:99731, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2102:13726:37821, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2312:11652:76919, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1208:4895:32748, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1106:9371:79983, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2111:1798:22917, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1107:1267:20231, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1109:15189:92031, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2302:9045:63944, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1102:14247:57062, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2305:7407:36655, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2204:12584:72228, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1111:18302:40904, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2316:8382:94789, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2109:12845:82338, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1206:10557:31568, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2210:14790:11210, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1303:7824:5423, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2111:9909:100689, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2202:16293:94205, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1102:16519:74708, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1305:10365:69588, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2102:8288:100810, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1311:17645:65928, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1109:17819:68329, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2206:3160:52730, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1112:18820:52584, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1108:4475:4687, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2205:7334:35631, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2106:9384:64665, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2316:12960:78271, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1104:3451:71528, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2211:21055:28695, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2202:13814:96357, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2111:17147:10853, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2106:20520:88043, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1214:2637:77724, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1109:9367:35640, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1215:11379:23758, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1304:17507:91188, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2204:12459:100042, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2216:8585:77239, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1313:12667:24591, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1316:10367:5281, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1315:15333:2359, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1206:5534:7650, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2102:4820:93659, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2104:6528:72676, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2305:7297:76200, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1315:5361:88165, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2305:17200:26640, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1302:2356:100479, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1101:3217:24975, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2314:1898:42432, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:1316:5424:4897, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2104:16620:81246, Mate not found for paired read
ERROR: Read name HWI-D00474:91:C62ARANXX:8:2102:15822:17446, Mate not found for paired read
Maximum output of [100] errors reached.
[Thu Mar 26 16:44:50 CET 2015] picard.sam.ValidateSamFile done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=505937920
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

Please let me know where to send the test BAM file.

Regards,

Bernt

Comments (5)

Hi,

I am attempting to merge the output of tophat in order to run some RNASeq QC metrics. This is a single read 50 bp on a Hiseq. In order to get past the fact that tophat gives a MAPQ of 255 to unmapped reads (not 0 as expected by Picard) I used the following tool ( https://github.com/cbrueffer/tophat-recondition) to change it.

Once completed, I added read groups using picard and then sorted the accepted_hits.bam by coordinate and sorted the unmapped reads by queryname.

tophat-recondition.py /home/ryan/NGS_Data/No_Dox

java -Xmx2g -jar /home/ryan/jar/picard-tools-1.129/picard.jar \ AddOrReplaceReadGroups \ I=/home/ryan/NGS_Data/unmapped_fixup.bam \ O=/home/ryan/NGS_Data/unmapped_fixup-RG.bam \ RGID=No_Dox RGLB=No_Dox RGPL=illumina RGPU=GCCAAT RGSM=No_Dox

java -Xmx2g -jar /home/ryan/jar/picard-tools-1.129/picard.jar \ AddOrReplaceReadGroups \ I=/home/ryan/NGS_Data/accepted_hits.bam \ O=/home/ryan/NGS_Data/accepted_hits-RG.bam \ SORT_ORDER=coordinate \ RGID=No_Dox RGLB=No_Dox RGPL=illumina RGPU=GCCAAT RGSM=No_Dox \ CREATE_INDEX=true

java -Xmx2g -jar /home/ryan/jar/picard-tools-1.129/picard.jar \ SortSam \ I=/home/ryan/NGS_Data/unmapped_fixup-RG.bam \ O=/home/ryan/NGS_Data/unmapped_fixup-RG-sorted.bam \ SORT_ORDER=queryname

java -Xmx2g -jar /home/ryan/jar/picard-tools-1.129/picard.jar \ SortSam \ I=/home/ryan/NGS_Data/accepted_hits-RG.bam \ O=/home/ryan/NGS_Data/accepted_hits-RG-sorted.bam \ SORT_ORDER=coordinate \ CREATE_INDEX=true

java -Xmx2g -jar /home/ryan/jar/picard-tools-1.129/picard.jar \ MergeBamAlignment \ UNMAPPED_BAM=/home/ryan/NGS_Data/unmapped_fixup-RG-sorted.bam \ ALIGNED_BAM=/home/ryan/NGS_Data/accepted_hits-RG-sorted.bam \ O=/home/ryan/NGS_Data/merge_unmapped_accepted_hits_No_Dox.bam \ SORT_ORDER=coordinate \ REFERENCE_SEQUENCE=/home/ryan/Reference/human_spikein/hg19_spikein.fa \ PROGRAM_RECORD_ID=Tophat \ PROGRAM_GROUP_VERSION=0.1 \ PROGRAM_GROUP_COMMAND_LINE=tophat/dummy \ PAIRED_RUN=false \ VALIDATION_STRINGENCY=LENIENT \ CREATE_INDEX=true

I then get the following warning followed by the error:

WARNING 2015-03-17 09:44:22 SamAlignmentMerger Exception merging bam alignment - attempting to sort aligned reads and try again: Aligned record iterator (HS3:608:C6LNLACXX:7:2101:9946:63417) is behind the unmapped reads (HS3:608:C6LNLACXX:7:2101:9947:11009) INFO 2015-03-17 09:44:33 SamAlignmentMerger Read 1000000 records from alignment SAM/BAM. INFO 2015-03-17 09:44:43 SamAlignmentMerger Read 2000000 records from alignment SAM/BAM. .... INFO 2015-03-17 09:58:01 SamAlignmentMerger Read 96000000 records from alignment SAM/BAM. INFO 2015-03-17 09:58:09 SamAlignmentMerger Read 97000000 records from alignment SAM/BAM. INFO 2015-03-17 09:58:15 SamAlignmentMerger Finished reading 97571897 total records from alignment SAM/BAM. [Tue Mar 17 09:58:16 PDT 2015] picard.sam.MergeBamAlignment done. Elapsed time: 14.32 minutes. Runtime.totalMemory()=1908932608 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.IllegalStateException: Aligned record iterator (HS3:608:C6LNLACXX:7:1101:10000:11036) is behind the unmapped reads (HS3:608:C6LNLACXX:7:1101:10000:48402) at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:383) at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:153) at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:248) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

I have searched and have been unsuccessful at resolving this problem. Any ideas?

I am running picard 1.129 using java version "1.7.0_60"

ValidateSamFile on both inputs into MergeBamAlignment returns no errors in those files. I am at a lost and seeking for help!

Thanks,

Ryan

Comments (2)

Hey guys,

I was running MarkDuplicates with BWA output, and I got a error "java.lang.NoClassDefFoundError: java/lang/ref/Finalizer$2". To get around it, I tried the following:

1) Used different versions of Picard. I tried 1.90, 1.119, 1.228. None of them worked. 2) I built picard.jar from source code. Still didn't work. 3) Used -M in BWA mem, no luck. When I used aln and sampe, I still got the same error. 4) So I tried MarkDuplicates on RNA-Seq data, it worked perfectly on TopHat output!

So the conclusion I've reached is that MarkDuplicats doesn't work well with BWA output. Anyone knows how to deal with this situation? Thank you in advance!

More information: I'm using openjdk version "1.6.0-internal", Linux system.

The error message: Exception in thread "main" java.lang.NoClassDefFoundError: java/lang/ref/Finalizer$2 at java.lang.ref.Finalizer.runFinalization(Finalizer.java:144) at java.lang.Runtime.runFinalization0(Native Method) at java.lang.Runtime.runFinalization(Runtime.java:705) at java.lang.System.runFinalization(System.java:967) at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:58) at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49) at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76) at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180) at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164) at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65) at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:290) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:114) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Comments (2)

Greeting all.

Currently, I have been using Picard's built-in library "BuildBamIndex" in order to index my bam files.

I have followed the manual described in Picard sites but I got error message.

Here is my command line that you can easily understand as below.

java -Xmx8g -XX:ParallelGCThreads=8 -jar $picard/BuildBamIndex.jar I=$RealignedBamDir/$output6

I tried different approach to avoid this error message so I used "samtools index" which i think is also same function as Picard BuildBamIndex.

After using samtools, I successfully got my bam index files.

I suppose that there are no major difference between Picard bamindex and samtools bam index.

I am confusing that why only samtools index procedure is working fine?

Below is my error message when run "BuildBamIndex" from Picard.

**[Sun Jan 18 22:15:42 KST 2015] picard.sam.BuildBamIndex INPUT=/DATA1/sclee1/data/URC_WES/U01/01U_N_Filtered_Sorted_Markdup_readgroup.bam VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Sun Jan 18 22:15:42 KST 2015] picard.sam.BuildBamIndex done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2058354688 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Exception creating BAM index for record HSQ-2K:530:C5PJAACXX:6:2109:18806:13902 1/2 101b aligned read. at htsjdk.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92) at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:291) at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:271) at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:133) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105) Caused by: htsjdk.samtools.SAMException: BAM cannot be indexed without setting a fileSource for record HSQ-2K:530:C5PJAACXX:6:2109:18806:13902 1/2 101b aligned read. at htsjdk.samtools.BAMIndexMetaData.recordMetaData(BAMIndexMetaData.java:130) at htsjdk.samtools.BAMIndexer$BAMIndexBuilder.processAlignment(BAMIndexer.java:182) at htsjdk.samtools.BAMIndexer.processAlignment(BAMIndexer.java:90) ... 6 more **

I look forward to hearing positive answers from you soon.

Bye!

Comments (2)

Hi,

I'm having trouble removing duplicates using Picard tools on SOLiD data. I get a regex not matching error.

The reads have the following names:

22_758_632_F3

604_1497_576

124_1189_1519_F5

358_1875_702_F5-DNA

And I don't think Picard tools is able to pick these read names with its default regex.

I tried to change the default regex. This time it does not throw an error, but it takes too long and times out (out of memory). I suspect I'm not giving the right regex. Here is my command:

java -jar $PICARD_TOOLS_HOME/MarkDuplicates.jar I=$FILE O=$BAMs/MarkDuplicates/$SAMPLE.MD.bam M=$BAMs/MarkDuplicates/$SAMPLE.metrics READ_NAME_REGEX="([0-9]+)([0-9]+)([0-9]+).*"

Any help is appreciated. Thanks!

Comments (2)

Hi,

We ran HaplotypeCaller on some bams and got the same error on them: Rod span is not contained within the data shard, meaning we wouldn't get all of the data we need.

Here is my command: java -Xmx32g -jar ~/GATK-3.2.2/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R /hg19/hg19.fasta \ -I $1 \ -ERC GVCF \ -nct 16 \ --variant_index_type LINEAR \ --variant_index_parameter 128000 \ --pair_hmm_implementation VECTOR_LOGLESS_CACHING \ -o ${TEMPORARY_DIR}/$2

Can you help me figure out whats wrong? The error said it may be a bug and I cannot find where this issue was previously addressed. I wonder if something went wrong with the bam processing.

We received these bams from the sequencing institution already aligned. Individual reads were missing the @RG tag, so we used bamaddrg to add the @RG tags. That caused issues with the bam 'bins', so we had to run the htsjdk.samtools.FixBAMFile. We ran MarkDuplicates and attempted the above HaplotypeCaller command.

Thanks!

Mark

Comments (4)

Hi fellow htsjdk/picard/gatk developers!

I've been thinking about this for quite some time now, so I thought I should write up a quick post about it here.

I've been writing custom tools for our group using both picard and GATK for some time now. It's been working nicely, but I have been missing a set of basic tutorials and examples, for users to quickly get started writing walkers. My most commonly used reference has been the 20-line life savers (http://www.slideshare.net/danbolser/20line-lifesavers-coding-simple-solutions-in-the-gatk) which is getting a bit dated.

What I would like to see is something like for following:

  • What's in htsjsk? What's not in htsjdk? (from a dev's perspective - in terms of frameworks)
  • What's in picard? What's not in picard? (from a dev's perspective - in terms of frameworks)
  • What's in gatk? What's not in gatk? (from a dev's perspective - in terms of frameworks)
  • When to use htsjdk, picard any GATK. What are the strengths and weaknesses of the three. (possibly more that I've missed)
  • Your first htsjdk walker
  • Your first picard walker
  • Your first gatk walker
  • Traversing a BAM in htsjdk vs gatk - what are the differences

There might be more stuff that could go in here as well. The driving force behind this is that I'm myself a bit confused by the overlap of these three packages/frameworks. I do understand that picard uses htsjdk, and that GATK uses both as dependencies, but it's not super clear what extra functionality (for a developer) is added from htsjdk -> picard -> gatk.

Could we assemble a small group of interested developers to contribute to this? We could set up a git repo with the examples and tutorials for easy collaboration and sharing online.

Anyone interested? I'll could myself as the first member :)

Comments (9)

Hi,

So I've finally taken the plunge and migrated our analysis pipeline to Queue. With some great feedback from @johandahlberg, I have gotten to a state where most of the stuff is running smoothly on the cluster.

I'm trying to add Picard's CalculateHSMetrics to the pipeline, but am having some issues. This code:

case class hsmetrics(inBam: File, baitIntervals: File, targetIntervals: File, outMetrics: File) extends CalculateHsMetrics with ExternalCommonArgs with SingleCoreJob with OneDayJob {
    @Input(doc="Input BAM file") val bam: File = inBam
    @Output(doc="Metrics file") val metrics: File = outMetrics
    this.input :+= bam
    this.targets = targetIntervals
    this.baits = baitIntervals
    this.output = metrics
    this.reference =  refGenome
    this.isIntermediate = false
}

Gives the following error message:

ERROR 06:56:25,047 QGraph - Missing 2 values for function:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp' null 'INPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.bam'  'TMP_DIR=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp'  'VALIDATION_STRINGENCY=SILENT'  'OUTPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.preMarkDupsHsMetrics.metrics'  'BAIT_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals'  'TARGET_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals'  'REFERENCE_SEQUENCE=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/bwaindex0.6/exampleFASTA.fasta'  'METRIC_ACCUMULATION_LEVEL=SAMPLE'  
ERROR 06:56:25,048 QGraph -   @Argument: jarFile - jar 
ERROR 06:56:25,049 QGraph -   @Argument: javaMainClass - Main class to run from javaClasspath 

And yeah, is seems that the jar file is currently set to null in the command line. However, MarkDuplicates runs fine without setting the jar:

case class dedup(inBam: File, outBam: File, metricsFile: File) extends MarkDuplicates with ExternalCommonArgs with SingleCoreJob with OneDayJob {
    @Input(doc = "Input bam file") var inbam = inBam
    @Output(doc = "Output BAM file with dups removed") var outbam = outBam
    this.REMOVE_DUPLICATES = true
    this.input :+= inBam
    this.output = outBam
    this.metrics = metricsFile
    this.memoryLimit = 3
    this.isIntermediate = false
}

Why does CalculateHSMetrics need the jar, but not MarkDuplicates? Both are imported with import org.broadinstitute.sting.queue.extensions.picard._.

Comments (8)

Hi guys, I've seen this error has been reported other times, for different reasons. The thing is that, the bam file I'm using to reduce the reads has been processed through GATK pipeline without problems, realignment and recalibration included. Therefore, I assumed the bam file generated after BQSR would be GATK-compliant. I was running with Queue, so I just run here the exact command passed to the job in an interactive mode, to see what happens.

Here below is the full command and error message (apologies for lengthy output), where there's no stack trace after the error.

        [fles@login07 reduced]$ 'java'  '-Xmx12288m'  '-Djava.io.tmpdir=/scratch/scratch/fles/project_analysis/reduced/tmp'  '-cp' '/home/fles/applications/Queue-2.7-4-g6f46d11/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'ReduceReads'  '-I' '/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam'  '-R' '/home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta'  '-o' '/scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam'
        INFO  09:27:21,728 HelpFormatter - -------------------------------------------------------------------------------- 
        INFO  09:27:21,730 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:29:52 
        INFO  09:27:21,731 HelpFormatter - Copyright (c) 2010 The Broad Institute 
        INFO  09:27:21,731 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
        INFO  09:27:21,735 HelpFormatter - Program Args: -T ReduceReads -I /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam -R /home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta -o /scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam 
        INFO  09:27:21,735 HelpFormatter - Date/Time: 2013/11/08 09:27:21 
        INFO  09:27:21,735 HelpFormatter - -------------------------------------------------------------------------------- 
        INFO  09:27:21,735 HelpFormatter - -------------------------------------------------------------------------------- 
        INFO  09:27:34,156 GenomeAnalysisEngine - Strictness is SILENT 
        INFO  09:27:34,491 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 40 
        INFO  09:27:34,503 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
        INFO  09:27:34,627 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.12 
        INFO  09:27:35,039 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
        INFO  09:27:35,045 GenomeAnalysisEngine - Done preparing for traversal 
        INFO  09:27:35,045 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
        INFO  09:27:35,046 ProgressMeter -        Location processed.reads  runtime per.1M.reads completed total.runtime remaining 
        INFO  09:27:35,080 ReadShardBalancer$1 - Loading BAM index data 
        INFO  09:27:35,081 ReadShardBalancer$1 - Done loading BAM index data 
        INFO  09:28:05,059 ProgressMeter -      1:18958138        1.00e+06   30.0 s       30.0 s      0.6%        81.8 m    81.3 m 
        INFO  09:28:35,069 ProgressMeter -      1:46733396        2.30e+06   60.0 s       26.0 s      1.5%        66.4 m    65.4 m 
        INFO  09:29:05,079 ProgressMeter -      1:92187730        3.50e+06   90.0 s       25.0 s      3.0%        50.5 m    49.0 m 
        INFO  09:29:35,088 ProgressMeter -     1:145281942        4.90e+06  120.0 s       24.0 s      4.7%        42.7 m    40.7 m 
        INFO  09:30:05,098 ProgressMeter -     1:152323864        6.40e+06    2.5 m       23.0 s      4.9%        50.9 m    48.4 m 
        INFO  09:30:35,893 ProgressMeter -     1:181206886        7.70e+06    3.0 m       23.0 s      5.8%        51.4 m    48.4 m 
        INFO  09:31:05,902 ProgressMeter -     1:217604563        8.90e+06    3.5 m       23.0 s      7.0%        49.9 m    46.4 m 
        INFO  09:31:35,913 ProgressMeter -      2:14782401        1.02e+07    4.0 m       23.0 s      8.5%        47.0 m    43.0 m 
        INFO  09:32:05,922 ProgressMeter -      2:62429207        1.15e+07    4.5 m       23.0 s     10.0%        44.8 m    40.3 m 
        INFO  09:32:35,931 ProgressMeter -      2:97877374        1.28e+07    5.0 m       23.0 s     11.2%        44.7 m    39.7 m 
        INFO  09:33:06,218 ProgressMeter -     2:135574018        1.42e+07    5.5 m       23.0 s     12.4%        44.5 m    38.9 m 
        INFO  09:33:36,227 ProgressMeter -     2:179431307        1.56e+07    6.0 m       23.0 s     13.8%        43.5 m    37.5 m 
        INFO  09:34:06,237 ProgressMeter -     2:216279690        1.69e+07    6.5 m       23.0 s     15.0%        43.4 m    36.9 m 
        INFO  09:34:36,248 ProgressMeter -      3:14974731        1.81e+07    7.0 m       23.0 s     16.4%        42.9 m    35.9 m 
        INFO  09:35:07,073 ProgressMeter -      3:52443620        1.94e+07    7.5 m       23.0 s     17.6%        42.9 m    35.4 m 
        INFO  09:35:37,084 ProgressMeter -     3:111366536        2.05e+07    8.0 m       23.0 s     19.5%        41.3 m    33.2 m 
        INFO  09:36:07,094 ProgressMeter -     3:155571144        2.18e+07    8.5 m       23.0 s     20.9%        40.8 m    32.3 m 
        INFO  09:36:37,103 ProgressMeter -       4:3495327        2.31e+07    9.0 m       23.0 s     22.4%        40.4 m    31.3 m 
        INFO  09:37:07,114 ProgressMeter -      4:48178306        2.43e+07    9.5 m       23.0 s     23.8%        40.0 m    30.5 m 
        INFO  09:37:37,270 ProgressMeter -     4:106747046        2.56e+07   10.0 m       23.0 s     25.7%        39.0 m    29.0 m 
        INFO  09:38:07,483 ProgressMeter -     4:181303657        2.69e+07   10.5 m       23.0 s     28.1%        37.5 m    26.9 m 
        INFO  09:38:37,493 ProgressMeter -      5:41149454        2.81e+07   11.0 m       23.0 s     29.7%        37.1 m    26.1 m 
        INFO  09:38:51,094 GATKRunReport - Uploaded run statistics report to AWS S3 
        ##### ERROR ------------------------------------------------------------------------------------------
        ##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11): 
        ##### ERROR
        ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
        ##### ERROR The error message below tells you what is the problem.
        ##### ERROR
        ##### ERROR If the problem is an invalid argument, please check the online documentation guide
        ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
        ##### ERROR
        ##### ERROR Visit our website and forum for extensive documentation and answers to 
        ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
        ##### ERROR
        ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
        ##### ERROR
        ##### ERROR MESSAGE: SAM/BAM file /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam is malformed: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam
        ##### ERROR ------------------------------------------------------------------------------------------

Following your usual advice, I validated the bam file produced by BQSR with Picard and I get the exact same error, but no specific error indication

        [fles@login07 reduced]$ java -jar ~/applications/picard-tools-1.102/ValidateSamFile.jar \
        > INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam \
        > IGNORE_WARNINGS=TRUE
        [Fri Nov 08 09:59:42 GMT 2013] net.sf.picard.sam.ValidateSamFile INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam IGNORE_WARNINGS=true    MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Nov 08 09:59:42 GMT 2013] Executing as fles@login07 on Linux 2.6.18-194.11.4.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18; Picard version: 1.102(1591)
        INFO    2013-11-08 10:01:01 SamFileValidator    Validated Read    10,000,000 records.  Elapsed time: 00:01:18s.  Time for last 10,000,000:   78s.  Last read position: 1:204,966,172
        INFO    2013-11-08 10:02:19 SamFileValidator    Validated Read    20,000,000 records.  Elapsed time: 00:02:36s.  Time for last 10,000,000:   78s.  Last read position: 2:232,121,396
        INFO    2013-11-08 10:03:36 SamFileValidator    Validated Read    30,000,000 records.  Elapsed time: 00:03:54s.  Time for last 10,000,000:   77s.  Last read position: 4:123,140,629
        [Fri Nov 08 10:04:00 GMT 2013] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 4.30 minutes.
        Runtime.totalMemory()=300941312
        To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
        Exception in thread "main" net.sf.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam
            at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:397)
            at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371)
            at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357)
            at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:558)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:532)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
            at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
            at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:687)
            at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:665)
            at net.sf.picard.sam.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:241)
            at net.sf.picard.sam.SamFileValidator.validateSamFile(SamFileValidator.java:177)
            at net.sf.picard.sam.SamFileValidator.validateSamFileSummary(SamFileValidator.java:104)
            at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:164)
            at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
            at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:100)
        Caused by: java.io.IOException: Unexpected compressed block length: 1
            at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:358)
            at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:113)
            at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:238)
            at java.io.DataInputStream.read(DataInputStream.java:149)
            at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:395)

any suggestions on what I might do wrong?

Comments (5)

I finally got the filtered VCF file from PWA + PiCard + GATK pipeline, and have 11 exome-seq data files which were processed as a list of input to GATK. In the process of getting VCF, I did not see an option of separating the 11 samples. Now, I've got two VCF files (one for SNPs and the other for indels) that each has 11 samples. My question is how to proceed from here?

Should I separate the 11 files before annotation? or annotation first then split them 11 samples to individual files? Big question here is how to split the samples from vcf files? thanks

Comments (1)

I happened to be readying a new reference genome and was using FastaStats to force the creation of the .dict and .fai files. The automatic .dict file creation first creates a temporary file, creates the .dict using Picard, and then copies it as appropriate. However, the newer versions of Picard won't overwrite the dict file, and so there is an error using the Java created temporary file as an output. The problematic section seems to be ReferenceDataSource.java. The error manifests as:

Index file /gs01/projects/ngs/resources/gatk/2.3/human_g1k_v37.dict does not exist but could not be created because: .   /gs01/projects/ngs/resources/gatk/2.3/dict8545428789729910550.tmp already exists.  Delete this file and try again, or specify a different output file.
Comments (9)

Hi there,

I get an error when I try to run GATK with the following command:

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I  merged_bam_files_indexed_markduplicate.bam -o reads.intervals

However I get this error:

SAM/BAM file SAMFileReader{/merged_bam_files_indexed_markduplicate.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK.  Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem

It suggest that it a header issue however my bam file has a header:

samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

when I grep the read within the error:

HWI-ST303_0093:5:5:13416:34802#0        99      1       1090    29      23S60M17S       =       1150    160     TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC        GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E        MD:Z:60 PG:Z:MarkDuplicates     RG:Z:test1      XG:i:0  AM:i:29 NM:i:0  SM:i:29 XM:i:0  XO:i:0  XT:A:M

Following Picard solution:

java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O= test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test  RGPL=Illumina RGSM=test/ RGPU=HWI-ST303  RGCN=japan CREATE_INDEX=True 

I get this error after 2 min.:

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.`

Any recommendation on how to solve this issue ?

My plan is to do the following to resolve the issue:

picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=LENIANT
samtools  index test_markduplicate.bam

I see a lot of messages like below but the command still running:

Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0, RG ID on SAMRecord not found in header: test1

while running the command

then try the GATK RealignerTargetCreator

I already tried to do the following

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I  merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT

But I still got the same error

N.B: the same command run with no issue with GATK version (1.2)

My pipeline in short: mapping the paired end reads with

bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools  index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam

RG.txt

@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

samtools  index test.bam
picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=SILENT
samtools  index test_markduplicate.bam
Comments (2)

Hi all,

I am doing an exome analysis with BWA 0.6.1-r104, Picard 1.79 and GATK v2.2-8-gec077cd. I have paired end reads, my protocol until now is (in brief, omitting options etc.)

bwa aln R1.fastq bwa aln R2.fastq bwa sampe R1.sai R2.sai picard/CleanSam.jar picard/SortSam.jar picard/MarkDuplicates.jar picard/AddOrReplaceReadGroups.jar picard/BuildBamIndex.jar GATK -T RealignerTargetCreator -known dbsnp.vcf GATK -T IndelRealigner -known dbsnp.vcf GATK -T BaseRecalibrator -knownSites dbsnp.vcf GATK -T PrintReads

A closer look on the output of the above toolchain revealed changes in read counts I did not quite understand.

I have 85767226 paired end = 171534452 sequences in fastQ file

BWA reports this number, the cleaned SAM file has 171534452 alignments as expected.

MarkDuplicates reports:

Read 165619516 records. 2 pairs never matched. Marking 20272927 records as duplicates. Found 2919670 optical duplicate clusters.

so nearly 6 million reads seem to miss.

CreateTargets MicroScheduler reports

35915555 reads were filtered out during traversal out of 166579875 total (21.56%) -> 428072 reads (0.26% of total) failing BadMateFilter -> 16077607 reads (9.65% of total) failing DuplicateReadFilter -> 19409876 reads (11.65% of total) failing MappingQualityZeroFilter

so nearly 5 million reads seem to miss

The Realigner MicroScheduler reports

0 reads were filtered out during traversal out of 171551640 total (0.00%)

which appears a miracle to me since 1) there are even more reads now than input sequences, 2) all those crappy reads reported by CreateTargets do not appear.

From Base recalibration MicroScheduler, I get

41397379 reads were filtered out during traversal out of 171703265 total (24.11%) -> 16010068 reads (9.32% of total) failing DuplicateReadFilter -> 25387311 reads (14.79% of total) failing MappingQualityZeroFilter

..... so my reads got even more offspring, but, e.g., the duplicate reads reappear with "roughly" the same number.

I found these varying counts a little irritating -- can someone please give me a hint on the logics of these numbers? And, does the protocol look meaningful?

Thanks for any comments!

Comments (1)

Picard appears not to like the way BWA codes mtDNA. I am doing human exome sequencing using a copy of hg19 which I obtained from UCSC and indexed using BWA per the instructions here:

Example 1

[Tue Aug 28 12:45:16 EDT 2012] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=125435904
FAQ: http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in ISIZE column; Line 3982
Line: FCC0CHTACXX:1:1101:14789:3170#TAGCTTAT 117 chrM 304415842 0 100M = -1610645157 2379906297 TGCGACTTGGAAGCGGATTCAGAGGACAGGACAGAACACTTGGGCAAGTGAATCTCTGTCTGTCTGTCTGTCTCATTGGTTGGTTTATTTCCATTTTCTT B@<:>CDDDBDDBDEEEEEEFEFCCHHFHHGGIIIHIGJJJIIGGGIIIIJJJIIGJIJGG@CEIFJIJJJJIJIJIJJJJIJJJGIHHGHFFEFFFCCC RG:Z:1868 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:39G45G14 XA:Z:chrM,-391302964,100M,2;
at net.sf.samtools.SAMTextReader.reportFatalErrorParsingLine(SAMTextReader.java:223)
at net.sf.samtools.SAMTextReader.access$400(SAMTextReader.java:40)
at net.sf.samtools.SAMTextReader$RecordIterator.parseInt(SAMTextReader.java:293)
at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:394)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:278)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:250)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:641)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:619)
at net.sf.picard.sam.SortSam.doWork(SortSam.java:68)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.SortSam.main(SortSam.java:57)

Example 2

java -jar ~/bin/picard-tools-1.74/MarkDuplicates.jar \
INPUT=1sorted.bam \
OUTPUT=1dedup.bam \
ASSUME_SORTED=true \
METRICS_FILE=metrics \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=LENIENT

...
Ignoring SAM validation error: ERROR: Record 691, Read name FCC0CHTACXX:1:1302:4748:176644#GGCTACAT, Mate Alignment start (436154938) must be <= reference sequence length (16571) on reference chrM
Ignoring SAM validation error: ERROR: Record 692, Read name FCC0CHTACXX:1:2104:8494:167812#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 693, Read name FCC0CHTACXX:1:1201:21002:183608#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 694, Read name FCC0CHTACXX:1:2303:3184:35872#GGCTACAT, Mate Alignment start (436154812) must be <= reference sequence length (16571) on reference chrM
...

I've truncated the output; in fact it throws such an error for every single line of mitochondrial reads.

I suspect I could solve this by writing my own script to go in and change the way one column is coded, but more broadly, I am interested in the answer to "how do you make BWA, Picard and GATK work seamlessly together without needing to do your own scripting"?