# Tagged with #picard 4 documentation articles | 1 announcement | 13 forum discussions

#### Objective

Install all software packages required to follow the GATK Best Practices.

#### Prerequisites

To follow these instructions, you will need to have a basic understanding of the meaning of the following words and command-line operations. If you are unfamiliar with any of the following, you should consult a more experienced colleague or your systems administrator if you have one. There are also many good online tutorials you can use to learn the necessary notions.

• Basic Unix environment commands
• Binary / Executable
• Compiling a binary
• Command-line shell, terminal or console
• Software library

You will also need to have access to an ANSI compliant C++ compiler and the tools needed for normal compilations (make, shell, the standard library, tar, gunzip). These tools are usually pre-installed on Linux/Unix systems. On MacOS X, you may need to install the MacOS Xcode tools. See https://developer.apple.com/xcode/ for relevant information and software downloads.

Starting with version 2.6, the GATK requires Java Runtime Environment version 1.7. All Linux/Unix and MacOS X systems should have a JRE pre-installed, but the version may vary. To test your Java version, run the following command in the shell:

java -version


This should return a message along the lines of ”java version 1.7.0_25” as well as some details on the Runtime Environment (JRE) and Virtual Machine (VM). If you have a version other than 1.7.x, be aware that you may run into trouble with some of the more advanced features of the Picard and GATK tools. The simplest solution is to install an additional JRE and specify which you want to use at the command-line. To find out how to do so, you should seek help from your systems administrator.

#### Software packages

1. BWA
2. SAMtools
3. HTSlib (optional)
4. Picard
5. Genome Analysis Toolkit (GATK)
6. IGV
7. RStudio IDE and R libraries ggplot2 and gsalib

### 1. BWA

• Installation

Unpack the tar file using:

tar xvzf bwa-0.7.5a.tar.bz2


This will produce a directory called bwa-0.7.5a containing the files necessary to compile the BWA binary. Move to this directory and compile using:

cd bwa-0.7.5a
make


The compiled binary is called bwa. You should find it within the same folder (bwa-0.7.5a in this example). You may also find other compiled binaries; at time of writing, a second binary called bwamem-lite is also included. You can disregard this file for now. Finally, just add the BWA binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

bwa


This should print out some version and author information as well as a list of commands. As the Usage line states, to use BWA you will always build your command lines like this:

bwa <command> [options]


This means you first make the call to the binary (bwa), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command.

### 2. SAMtools

• Installation

Unpack the tar file using:

tar xvzf samtools-0.1.19.tar.bz2


This will produce a directory called samtools-0.1.19 containing the files necessary to compile the SAMtools binary. Move to this directory and compile using:

cd samtools-0.1.19
make


The compiled binary is called samtools. You should find it within the same folder (samtools-0.1.19 in this example). Finally, add the SAMtools binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

samtools


This should print out some version information as well as a list of commands. As the Usage line states, to use SAMtools you will always build your command lines like this:

samtools <command> [options]


This means you first make the call to the binary (samtools), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is the same convention as used by BWA.

### 3. HTSlib (optional)

• Installation

Unpack the tar file using:

tar xjf htslib-master.zip


This will produce a directory called htslib-master containing the files necessary to compile the HTSlib binary. Move to this directory and compile using:

cd htslib-master
make


The compiled binary is called htscmd. You should find it within the same folder (htslib-master in this example). Finally, add the HTSlib binary to your path to make it available on the command line. This completes the installation process.

• Testing

Open a shell and run:

htscmd


This should print out some version information as well as a list of commands. As the Usage line states, to use HTSlib you will always build your command lines like this:

htscmd <command> [options]


This means you first make the call to the binary (htscmd), then you specify which command (method) you wish to use (e.g. index) then any options (i.e. arguments such as input files or parameters) used by the program to perform that command. This is the same convention as used by BWA and SAMtools.

### 4. Picard

• Installation

Unpack the zip file using:

tar xjf picard-tools-1.94.zip


This will produce a directory called picard-tools-1.94 containing the Picard jar files. Picard tools are distributed as pre-compiled Java executables (jar files) so there is no need to compile them. Finally, add the Picard directory to your path to make the tools available on the command line. This completes the installation process.

• Testing

Open a shell and run:

java -jar AddOrReplaceReadGroups.jar -h


This should print out some version and usage information about the AddOrReplaceReadGroups.jar tool. At this point you will have noticed an important difference between BWA and Picard tools. To use BWA, we called on the BWA program and specified which of its internal tools we wanted to apply. To use Picard, we called on Java itself as the main program, then specified which jar file to use, knowing that one jar file = one tool. This applies to all Picard tools; to use them you will always build your command lines like this:

java -jar <ToolName.jar> [options]


Next we will see that GATK tools are called in yet another way. The reasons for how tools in a given software package are organized and invoked are largely due to the preferences of the software developers. They generally do not reflect strict technical requirements, although they can have an effect on speed and efficiency.

### 5. Genome Analysis Toolkit (GATK)

In order to access the downloads, you need to register for a free account on the GATK support forum. You will also need to read and accept the license agreement before downloading the GATK software package. Note that if you intend to use the GATK for commercial purposes, you will need to purchase a license from our commercial partner, Appistry. See Appistry's GATK FAQ page for an overview of the commercial licensing conditions.

• Installation

Unpack the tar file using:

tar xjf GenomeAnalysisTK-2.6-4.tar.bz2


This will produce a directory called GenomeAnalysisTK-2.6-4-g3e5ff60 containing the GATK jar file, which is called GenomeAnalysisTK.jar, as well as a directory of example files called resources. GATK tools are distributed as a single pre-compiled Java executable so there is no need to compile them. Finally, add the GATK directory to your path to make the tools available on the command line. This completes the installation process.

• Testing

Open a shell and run:

java -jar GenomeAnalysisTK.jar -h


This should print out some version and usage information, as well as a list of the tools included in the GATK. As the Usage line states, to use GATK you will always build your command lines like this:

java -jar GenomeAnalysisTK.jar -T <ToolName> [arguments]


This means you first make the call to Java itself as the main program, then specify the GenomeAnalysisTK.jar file, then specify which tool you want, and finally you pass whatever other arguments (input files, parameters etc.) are needed for the analysis.

So this way of calling the program and selecting which tool to run is a little like a hybrid of how we called BWA and how we called Picard tools. To put it another way, if BWA is a standalone game device that comes preloaded with several games, Picard tools are individual game cartridges that plug into the Java console, and GATK is a single cartridge that also plugs into the Java console but contains many games.

### 6. IGV

The Integrated Genomics Viewer is a genome browser that allows you to view BAM, VCF and other genomic file information in context. It has a graphical user interface that is very easy to use, and can be downloaded for free (though registration is required) from this website.

### 7. RStudio IDE and R libraries ggplot2 and gsalib

• Installation

Follow the installation instructions provided. Binaries are provided for all major platforms; typically they just need to be placed in your Applications (or Programs) directory. Open RStudio and type the following command in the console window:

install.packages("ggplot2")


This will download and install the ggplot2 library as well as any other library packages that ggplot2 depends on for its operation. Note that some users have reported having to install one additional package themselves, called reshape, which you can do as follows:

install.packages("reshape")


Finally, do the same thing to install the gsalib library:

install.packages("gsalib")


Important note

If you are using a recent version of ggplot2 and a version of GATK older than 3.2, you may encounter an error when trying to generate the BQSR or VQSR recalibration plots. This is because until recently our scripts were still using an older version of certain ggplot2 functions. This has been fixed in GATK 3.2, so you should either upgrade your version of GATK (recommended) or downgrade your version of ggplot2. If you experience further issues generating the BQSR recalibration plots, please see this tutorial.

The picard repository on github contains all picard public tools. Libraries live under the htsjdk, which includes the samtools-jdk, tribble, and variant packages (which includes VariantContext and associated classes as well as the VCF/BCF codecs).

If you just need to check out the sources and don't need to make any commits into the picard repository, the command is:

git clone https://github.com/broadinstitute/picard.git


Then within the picard directory, clone the htsjdk.

cd picard
git clone https://github.com/samtools/htsjdk.git


Then you can attach the picard/src/java and picard/htsjdk/src/java directories in IntelliJ as a source directory (File -> Project Structure -> Libraries -> Click the plus sign -> "Attach Files or Directories" in the latest IntelliJ).

To build picard and the htsjdk all at once, type ant from within the picard directory. To run tests, type ant test

If you do need to make commits into the picard repository, first you'll need to create a github account, fork picard or htsjdk, make your changes, and then issue a pull request. For more info on pull requests, see: https://help.github.com/articles/using-pull-requests

## 1. Overview

The Tribble project was started as an effort to overhaul our reference-ordered data system; we had many different formats that were shoehorned into a common framework that didn't really work as intended. What we wanted was a common framework that allowed for searching of reference ordered data, regardless of the underlying type. Jim Robinson had developed indexing schemes for text-based files, which was incorporated into the Tribble library.

## 2. Architecture Overview

Tribble provides a lightweight interface and API for querying features and creating indexes from feature files, while allowing iteration over know feature files that we're unable to create indexes for. The main entry point for external users is the BasicFeatureReader class. It takes in a codec, an index file, and a file containing the features to be processed. With an instance of a BasicFeatureReader, you can query for features that span a specific location, or get an iterator over all the records in the file.

## 3. Developer Overview

For developers, there are two important classes to implement: the FeatureCodec, which decodes lines of text and produces features, and the feature class, which is your underlying record type.

For developers there are two classes that are important:

• Feature

This is the genomicly oriented feature that represents the underlying data in the input file. For instance in the VCF format, this is the variant call including quality information, the reference base, and the alternate base. The required information to implement a feature is the chromosome name, the start position (one based), and the stop position. The start and stop position represent a closed, one-based interval. I.e. the first base in chromosome one would be chr1:1-1.

• FeatureCodec

This class takes in a line of text (from an input source, whether it's a file, compressed file, or a http link), and produces the above feature.

To implement your new format into Tribble, you need to implement the two above classes (in an appropriately named subfolder in the Tribble check-out). The Feature object should know nothing about the file representation; it should represent the data as an in-memory object. The interface for a feature looks like:

public interface Feature {

/**
* Return the features reference sequence name, e.g chromosome or contig
*/
public String getChr();

/**
* Return the start position in 1-based coordinates (first base is 1)
*/
public int getStart();

/**
* Return the end position following 1-based fully closed conventions.  The length of a feature is
* end - start + 1;
*/
public int getEnd();
}


And the interface for FeatureCodec:

/**
* the base interface for classes that read in features.
* @param <T> The feature type this codec reads
*/
public interface FeatureCodec<T extends Feature> {
/**
* Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.
*
* @param line the input line to decode
* @return  Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is
* a comment)
*/
public Feature decodeLoc(String line);

/**
* Decode a line as a Feature.
*
* @param line the input line to decode
* @return  Return the Feature encoded by the line,  or null if the line does not represent a feature (e.g. is
* a comment)
*/
public T decode(String line);

/**
* This function returns the object the codec generates.  This is allowed to be Feature in the case where
* conditionally different types are generated.  Be as specific as you can though.
*
* This function is used by reflections based tools, so we can know the underlying type
*
* @return the feature type this codec generates.
*/
public Class<T> getFeatureType();

*
*/
}


## 4. Supported Formats

The following formats are supported in Tribble:

• VCF Format
• DbSNP Format
• BED Format
• GATK Interval Format

## 5. Updating the Tribble, htsjdk, and/or Picard library

Updating the revision of Tribble on the system is a relatively straightforward task if the following steps are taken.

NOTE: Any directory starting with ~ may be different on your machine, depending on where you cloned the various repositories for gsa-unstable, picard, and htsjdk.

A Maven script to install picard into the local repository is located under gsa-unstable/private/picard-maven. To operate, it requires a symbolic link named picard pointing to a working checkout of the picard github repository. NOTE: compiling picard requires an htsjdk github repository checkout available at picard/htsjdk, either as a subdirectory or another symbolic link. The final full path should be gsa-unstable/private/picard-maven/picard/htsjdk.

cd ~/src/gsa-unstable
cd private/picard-maven
ln -s ~/src/picard picard


Create a git branch of Picard and/or htsjdk and make your changes. To install your changes into the GATK you must run mvn install in the private/picard-maven directory. This will compile and copy the jars into gsa-unstable/public/repo, and update gsa-unstable/gatk-root/pom.xml with the corresponding version. While making changes your revision of picard and htslib will be labeled with -SNAPSHOT.

cd ~/src/gsa-unstable
cd private/picard-maven
mvn install


Continue testing in the GATK. Once your changes and updated tests for picard/htsjdk are complete, push your branch and submit your pull request to the Picard and/or htsjdk github. After your Picard/htsjdk patches are accepted, switch your Picard/htsjdk branches back to the master branch. NOTE: Leave your gsa-unstable branch on your development branch!

cd ~/src/picard
ant clean
git checkout master
git fetch
git rebase
cd htsjdk
git checkout master
git fetch
git rebase


NOTE: The version number of old and new Picard/htsjdk will vary, and during active development will end with -SNAPSHOT. While, if needed, you may push -SNAPSHOT version for testing on Bamboo, you should NOT submit a pull request with a -SNAPSHOT version. -SNAPSHOT indicates your local changes are not reproducible from source control.

When ready, run mvn install once more to create the non -SNAPSHOT versions under gsa-unstable/public/repo. In that directory, git add the new version, and git rm the old versions.

cd ~/src/gsa-unstable
cd public/repo
git rm -r picard/picard/1.112.1452/
git rm -r samtools/htsjdk/1.112.1452/


Commit and then push your gsa-unstable branch, then issue a pull request for review.

### ReorderSam

The GATK can be particular about the ordering of a BAM file. If you find yourself in the not uncommon situation of having created or received BAM files sorted in a bad order, you can use the tool ReorderSam to generate a new BAM file where the reads have been reordered to match a well-ordered reference file.

java -jar picard/ReorderSam.jar I= lexicographc.bam O= kayrotypic.bam REFERENCE= Homo_sapiens_assembly18.kayrotypic.fasta


This tool requires you have a correctly sorted version of the reference sequence you used to align your reads. This tool will drop reads that don't have equivalent contigs in the new reference (potentially bad, but maybe not). If contigs have the same name in the bam and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a lift over tool!

The tool, though once in the GATK, is now part of the Picard package.

Consider this a public service announcement, since most GATK users probably also use Picard tools routinely. The recently released version 1.124 of the Picard tools includes many lovely improvements, bug fixes and even a few new tools (see release notes for full details) -- but you'll want to pay attention to one major change in particular.

From this version onward, the Picard release will contain a single JAR file containing all the tools, instead of one JAR file per tool as it was before. This means that when you invoke a Picar tool, you'll invoke a single JAR, then specify which tool (which they call CLP for Command Line Program) you want to run. This should feel completely familiar if you already use GATK regularly, but it does mean you'll need to update any scripts that use Picard tools to the new paradigm. Other than that, there's no change of syntax; Picard will still use e.g. I=input.bam where GATK would use -I input.bam.

We will need to update some of our own documentation accordingly over the near future; please bear with us as we go through this process, and let us know by commenting in this thread if you find any docs that have yet to be updated.

Hey guys,

I was running MarkDuplicates with BWA output, and I got a error "java.lang.NoClassDefFoundError: java/lang/ref/Finalizer$2". To get around it, I tried the following: 1) Used different versions of Picard. I tried 1.90, 1.119, 1.228. None of them worked. 2) I built picard.jar from source code. Still didn't work. 3) Used -M in BWA mem, no luck. When I used aln and sampe, I still got the same error. 4) So I tried MarkDuplicates on RNA-Seq data, it worked perfectly on TopHat output! So the conclusion I've reached is that MarkDuplicats doesn't work well with BWA output. Anyone knows how to deal with this situation? Thank you in advance! More information: I'm using openjdk version "1.6.0-internal", Linux system. The error message: Exception in thread "main" java.lang.NoClassDefFoundError: java/lang/ref/Finalizer$2 at java.lang.ref.Finalizer.runFinalization(Finalizer.java:144) at java.lang.Runtime.runFinalization0(Native Method) at java.lang.Runtime.runFinalization(Runtime.java:705) at java.lang.System.runFinalization(System.java:967) at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:58) at htsjdk.samtools.util.FileAppendStreamLRUCache$Functor.makeValue(FileAppendStreamLRUCache.java:49) at htsjdk.samtools.util.ResourceLimitedMap.get(ResourceLimitedMap.java:76) at htsjdk.samtools.CoordinateSortedPairInfoMap.getOutputStreamForSequence(CoordinateSortedPairInfoMap.java:180) at htsjdk.samtools.CoordinateSortedPairInfoMap.put(CoordinateSortedPairInfoMap.java:164) at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.put(DiskBasedReadEndsForMarkDuplicatesMap.java:65) at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:290) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:114) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Greeting all.

Currently, I have been using Picard's built-in library "BuildBamIndex" in order to index my bam files.

I have followed the manual described in Picard sites but I got error message.

Here is my command line that you can easily understand as below.

java -Xmx8g -XX:ParallelGCThreads=8 -jar $picard/BuildBamIndex.jar I=$RealignedBamDir/output6 I tried different approach to avoid this error message so I used "samtools index" which i think is also same function as Picard BuildBamIndex. After using samtools, I successfully got my bam index files. I suppose that there are no major difference between Picard bamindex and samtools bam index. I am confusing that why only samtools index procedure is working fine? Below is my error message when run "BuildBamIndex" from Picard. **[Sun Jan 18 22:15:42 KST 2015] picard.sam.BuildBamIndex INPUT=/DATA1/sclee1/data/URC_WES/U01/01U_N_Filtered_Sorted_Markdup_readgroup.bam VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Sun Jan 18 22:15:42 KST 2015] picard.sam.BuildBamIndex done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2058354688 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: Exception creating BAM index for record HSQ-2K:530:C5PJAACXX:6:2109:18806:13902 1/2 101b aligned read. at htsjdk.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92) at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:291) at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:271) at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:133) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105) Caused by: htsjdk.samtools.SAMException: BAM cannot be indexed without setting a fileSource for record HSQ-2K:530:C5PJAACXX:6:2109:18806:13902 1/2 101b aligned read. at htsjdk.samtools.BAMIndexMetaData.recordMetaData(BAMIndexMetaData.java:130) at htsjdk.samtools.BAMIndexerBAMIndexBuilder.processAlignment(BAMIndexer.java:182) at htsjdk.samtools.BAMIndexer.processAlignment(BAMIndexer.java:90) ... 6 more **

I look forward to hearing positive answers from you soon.

Bye!

Hi,

I'm having trouble removing duplicates using Picard tools on SOLiD data. I get a regex not matching error.

The reads have the following names:

22_758_632_F3

604_1497_576

124_1189_1519_F5

358_1875_702_F5-DNA

And I don't think Picard tools is able to pick these read names with its default regex.

I tried to change the default regex. This time it does not throw an error, but it takes too long and times out (out of memory). I suspect I'm not giving the right regex. Here is my command:

java -jar $PICARD_TOOLS_HOME/MarkDuplicates.jar I=$FILE O=$BAMs/MarkDuplicates/$SAMPLE.MD.bam M=$BAMs/MarkDuplicates/$SAMPLE.metrics READ_NAME_REGEX="([0-9]+)([0-9]+)([0-9]+).*"

Any help is appreciated. Thanks!

Hi,

We ran HaplotypeCaller on some bams and got the same error on them: Rod span is not contained within the data shard, meaning we wouldn't get all of the data we need.

Here is my command: java -Xmx32g -jar ~/GATK-3.2.2/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R /hg19/hg19.fasta \ -I $1 \ -ERC GVCF \ -nct 16 \ --variant_index_type LINEAR \ --variant_index_parameter 128000 \ --pair_hmm_implementation VECTOR_LOGLESS_CACHING \ -o${TEMPORARY_DIR}/2 Can you help me figure out whats wrong? The error said it may be a bug and I cannot find where this issue was previously addressed. I wonder if something went wrong with the bam processing. We received these bams from the sequencing institution already aligned. Individual reads were missing the @RG tag, so we used bamaddrg to add the @RG tags. That caused issues with the bam 'bins', so we had to run the htsjdk.samtools.FixBAMFile. We ran MarkDuplicates and attempted the above HaplotypeCaller command. Thanks! Mark Hi fellow htsjdk/picard/gatk developers! I've been thinking about this for quite some time now, so I thought I should write up a quick post about it here. I've been writing custom tools for our group using both picard and GATK for some time now. It's been working nicely, but I have been missing a set of basic tutorials and examples, for users to quickly get started writing walkers. My most commonly used reference has been the 20-line life savers (http://www.slideshare.net/danbolser/20line-lifesavers-coding-simple-solutions-in-the-gatk) which is getting a bit dated. What I would like to see is something like for following: • What's in htsjsk? What's not in htsjdk? (from a dev's perspective - in terms of frameworks) • What's in picard? What's not in picard? (from a dev's perspective - in terms of frameworks) • What's in gatk? What's not in gatk? (from a dev's perspective - in terms of frameworks) • When to use htsjdk, picard any GATK. What are the strengths and weaknesses of the three. (possibly more that I've missed) • Your first htsjdk walker • Your first picard walker • Your first gatk walker • Traversing a BAM in htsjdk vs gatk - what are the differences There might be more stuff that could go in here as well. The driving force behind this is that I'm myself a bit confused by the overlap of these three packages/frameworks. I do understand that picard uses htsjdk, and that GATK uses both as dependencies, but it's not super clear what extra functionality (for a developer) is added from htsjdk -> picard -> gatk. Could we assemble a small group of interested developers to contribute to this? We could set up a git repo with the examples and tutorials for easy collaboration and sharing online. Anyone interested? I'll could myself as the first member :) Hi, So I've finally taken the plunge and migrated our analysis pipeline to Queue. With some great feedback from @johandahlberg, I have gotten to a state where most of the stuff is running smoothly on the cluster. I'm trying to add Picard's CalculateHSMetrics to the pipeline, but am having some issues. This code: case class hsmetrics(inBam: File, baitIntervals: File, targetIntervals: File, outMetrics: File) extends CalculateHsMetrics with ExternalCommonArgs with SingleCoreJob with OneDayJob { @Input(doc="Input BAM file") val bam: File = inBam @Output(doc="Metrics file") val metrics: File = outMetrics this.input :+= bam this.targets = targetIntervals this.baits = baitIntervals this.output = metrics this.reference = refGenome this.isIntermediate = false }  Gives the following error message: ERROR 06:56:25,047 QGraph - Missing 2 values for function: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp' null 'INPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.bam' 'TMP_DIR=/Users/dankle/IdeaProjects/eclipse/AutoSeq/.queue/tmp' 'VALIDATION_STRINGENCY=SILENT' 'OUTPUT=/Users/dankle/tmp/autoseqscala/exampleIND2/exampleIND2.panel.preMarkDupsHsMetrics.metrics' 'BAIT_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals' 'TARGET_INTERVALS=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/exampleINTERVAL.intervals' 'REFERENCE_SEQUENCE=/Users/dankle/IdeaProjects/eclipse/AutoSeq/resources/bwaindex0.6/exampleFASTA.fasta' 'METRIC_ACCUMULATION_LEVEL=SAMPLE' ERROR 06:56:25,048 QGraph - @Argument: jarFile - jar ERROR 06:56:25,049 QGraph - @Argument: javaMainClass - Main class to run from javaClasspath  And yeah, is seems that the jar file is currently set to null in the command line. However, MarkDuplicates runs fine without setting the jar: case class dedup(inBam: File, outBam: File, metricsFile: File) extends MarkDuplicates with ExternalCommonArgs with SingleCoreJob with OneDayJob { @Input(doc = "Input bam file") var inbam = inBam @Output(doc = "Output BAM file with dups removed") var outbam = outBam this.REMOVE_DUPLICATES = true this.input :+= inBam this.output = outBam this.metrics = metricsFile this.memoryLimit = 3 this.isIntermediate = false }  Why does CalculateHSMetrics need the jar, but not MarkDuplicates? Both are imported with import org.broadinstitute.sting.queue.extensions.picard._. Hi guys, I've seen this error has been reported other times, for different reasons. The thing is that, the bam file I'm using to reduce the reads has been processed through GATK pipeline without problems, realignment and recalibration included. Therefore, I assumed the bam file generated after BQSR would be GATK-compliant. I was running with Queue, so I just run here the exact command passed to the job in an interactive mode, to see what happens. Here below is the full command and error message (apologies for lengthy output), where there's no stack trace after the error.  [fles@login07 reduced] 'java'  '-Xmx12288m'  '-Djava.io.tmpdir=/scratch/scratch/fles/project_analysis/reduced/tmp'  '-cp' '/home/fles/applications/Queue-2.7-4-g6f46d11/Queue.jar'  'org.broadinstitute.sting.gatk.CommandLineGATK'  '-T' 'ReduceReads'  '-I' '/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam'  '-R' '/home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta'  '-o' '/scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam'
INFO  09:27:21,728 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:27:21,730 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-4-g6f46d11, Compiled 2013/10/10 17:29:52
INFO  09:27:21,731 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  09:27:21,735 HelpFormatter - Program Args: -T ReduceReads -I /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam -R /home/fles/Scratch/gatkbundle_2.5/human_g1k_v37.fasta -o /scratch/scratch/fles/project_analysis/reduced/projectTrios.U1_PJ5208467.recal.reduced.bam
INFO  09:27:21,735 HelpFormatter - Date/Time: 2013/11/08 09:27:21
INFO  09:27:21,735 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:27:21,735 HelpFormatter - --------------------------------------------------------------------------------
INFO  09:27:34,156 GenomeAnalysisEngine - Strictness is SILENT
INFO  09:27:34,491 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 40
INFO  09:27:34,503 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 09:27:34,627 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.12
INFO  09:27:35,039 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO  09:27:35,045 GenomeAnalysisEngine - Done preparing for traversal
INFO  09:27:35,045 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  09:27:35,080 ReadShardBalancer$1 - Loading BAM index data INFO 09:27:35,081 ReadShardBalancer$1 - Done loading BAM index data
INFO  09:28:05,059 ProgressMeter -      1:18958138        1.00e+06   30.0 s       30.0 s      0.6%        81.8 m    81.3 m
INFO  09:28:35,069 ProgressMeter -      1:46733396        2.30e+06   60.0 s       26.0 s      1.5%        66.4 m    65.4 m
INFO  09:29:05,079 ProgressMeter -      1:92187730        3.50e+06   90.0 s       25.0 s      3.0%        50.5 m    49.0 m
INFO  09:29:35,088 ProgressMeter -     1:145281942        4.90e+06  120.0 s       24.0 s      4.7%        42.7 m    40.7 m
INFO  09:30:05,098 ProgressMeter -     1:152323864        6.40e+06    2.5 m       23.0 s      4.9%        50.9 m    48.4 m
INFO  09:30:35,893 ProgressMeter -     1:181206886        7.70e+06    3.0 m       23.0 s      5.8%        51.4 m    48.4 m
INFO  09:31:05,902 ProgressMeter -     1:217604563        8.90e+06    3.5 m       23.0 s      7.0%        49.9 m    46.4 m
INFO  09:31:35,913 ProgressMeter -      2:14782401        1.02e+07    4.0 m       23.0 s      8.5%        47.0 m    43.0 m
INFO  09:32:05,922 ProgressMeter -      2:62429207        1.15e+07    4.5 m       23.0 s     10.0%        44.8 m    40.3 m
INFO  09:32:35,931 ProgressMeter -      2:97877374        1.28e+07    5.0 m       23.0 s     11.2%        44.7 m    39.7 m
INFO  09:33:06,218 ProgressMeter -     2:135574018        1.42e+07    5.5 m       23.0 s     12.4%        44.5 m    38.9 m
INFO  09:33:36,227 ProgressMeter -     2:179431307        1.56e+07    6.0 m       23.0 s     13.8%        43.5 m    37.5 m
INFO  09:34:06,237 ProgressMeter -     2:216279690        1.69e+07    6.5 m       23.0 s     15.0%        43.4 m    36.9 m
INFO  09:34:36,248 ProgressMeter -      3:14974731        1.81e+07    7.0 m       23.0 s     16.4%        42.9 m    35.9 m
INFO  09:35:07,073 ProgressMeter -      3:52443620        1.94e+07    7.5 m       23.0 s     17.6%        42.9 m    35.4 m
INFO  09:35:37,084 ProgressMeter -     3:111366536        2.05e+07    8.0 m       23.0 s     19.5%        41.3 m    33.2 m
INFO  09:36:07,094 ProgressMeter -     3:155571144        2.18e+07    8.5 m       23.0 s     20.9%        40.8 m    32.3 m
INFO  09:36:37,103 ProgressMeter -       4:3495327        2.31e+07    9.0 m       23.0 s     22.4%        40.4 m    31.3 m
INFO  09:37:07,114 ProgressMeter -      4:48178306        2.43e+07    9.5 m       23.0 s     23.8%        40.0 m    30.5 m
INFO  09:37:37,270 ProgressMeter -     4:106747046        2.56e+07   10.0 m       23.0 s     25.7%        39.0 m    29.0 m
INFO  09:38:07,483 ProgressMeter -     4:181303657        2.69e+07   10.5 m       23.0 s     28.1%        37.5 m    26.9 m
INFO  09:38:37,493 ProgressMeter -      5:41149454        2.81e+07   11.0 m       23.0 s     29.7%        37.1 m    26.1 m
INFO  09:38:51,094 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.7-4-g6f46d11):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: SAM/BAM file /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam is malformed: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam
##### ERROR ------------------------------------------------------------------------------------------


Following your usual advice, I validated the bam file produced by BQSR with Picard and I get the exact same error, but no specific error indication

        [fles@login07 reduced]$java -jar ~/applications/picard-tools-1.102/ValidateSamFile.jar \ > INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam \ > IGNORE_WARNINGS=TRUE [Fri Nov 08 09:59:42 GMT 2013] net.sf.picard.sam.ValidateSamFile INPUT=/home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam IGNORE_WARNINGS=true MAX_OUTPUT=100 VALIDATE_INDEX=true IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Fri Nov 08 09:59:42 GMT 2013] Executing as fles@login07 on Linux 2.6.18-194.11.4.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_45-b18; Picard version: 1.102(1591) INFO 2013-11-08 10:01:01 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:18s. Time for last 10,000,000: 78s. Last read position: 1:204,966,172 INFO 2013-11-08 10:02:19 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:36s. Time for last 10,000,000: 78s. Last read position: 2:232,121,396 INFO 2013-11-08 10:03:36 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:03:54s. Time for last 10,000,000: 77s. Last read position: 4:123,140,629 [Fri Nov 08 10:04:00 GMT 2013] net.sf.picard.sam.ValidateSamFile done. Elapsed time: 4.30 minutes. Runtime.totalMemory()=300941312 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" net.sf.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /home/fles/Scratch/project_analysis/recalibrated/projectTrios.U1_PJ5208467.clean.dedup.recal.bam at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:397) at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:371) at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357) at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200) at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:558)
at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:532) at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:687)
at net.sf.samtools.SAMFileReaderAssertableIterator.next(SAMFileReader.java:665) at net.sf.picard.sam.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:241) at net.sf.picard.sam.SamFileValidator.validateSamFile(SamFileValidator.java:177) at net.sf.picard.sam.SamFileValidator.validateSamFileSummary(SamFileValidator.java:104) at net.sf.picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:164) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177) at net.sf.picard.sam.ValidateSamFile.main(ValidateSamFile.java:100) Caused by: java.io.IOException: Unexpected compressed block length: 1 at net.sf.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:358) at net.sf.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:113) at net.sf.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:238) at java.io.DataInputStream.read(DataInputStream.java:149) at net.sf.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:395)  any suggestions on what I might do wrong? I finally got the filtered VCF file from PWA + PiCard + GATK pipeline, and have 11 exome-seq data files which were processed as a list of input to GATK. In the process of getting VCF, I did not see an option of separating the 11 samples. Now, I've got two VCF files (one for SNPs and the other for indels) that each has 11 samples. My question is how to proceed from here? Should I separate the 11 files before annotation? or annotation first then split them 11 samples to individual files? Big question here is how to split the samples from vcf files? thanks I happened to be readying a new reference genome and was using FastaStats to force the creation of the .dict and .fai files. The automatic .dict file creation first creates a temporary file, creates the .dict using Picard, and then copies it as appropriate. However, the newer versions of Picard won't overwrite the dict file, and so there is an error using the Java created temporary file as an output. The problematic section seems to be ReferenceDataSource.java. The error manifests as: Index file /gs01/projects/ngs/resources/gatk/2.3/human_g1k_v37.dict does not exist but could not be created because: . /gs01/projects/ngs/resources/gatk/2.3/dict8545428789729910550.tmp already exists. Delete this file and try again, or specify a different output file.  Hi there, I get an error when I try to run GATK with the following command: java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I merged_bam_files_indexed_markduplicate.bam -o reads.intervals  However I get this error: SAM/BAM file SAMFileReader{/merged_bam_files_indexed_markduplicate.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK. Please use http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem  It suggest that it a header issue however my bam file has a header: samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG @RG ID:test1 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan @RG ID:test2 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan  when I grep the read within the error: HWI-ST303_0093:5:5:13416:34802#0 99 1 1090 29 23S60M17S = 1150 160 TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E MD:Z:60 PG:Z:MarkDuplicates RG:Z:test1 XG:i:0 AM:i:29 NM:i:0 SM:i:29 XM:i:0 XO:i:0 XT:A:M  Following Picard solution: java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O= test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test RGPL=Illumina RGSM=test/ RGPU=HWI-ST303 RGCN=japan CREATE_INDEX=True  I get this error after 2 min.: Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.  Any recommendation on how to solve this issue ? My plan is to do the following to resolve the issue: picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=LENIANT samtools index test_markduplicate.bam  I see a lot of messages like below but the command still running: Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0, RG ID on SAMRecord not found in header: test1  while running the command then try the GATK RealignerTargetCreator I already tried to do the following java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa -I merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT  But I still got the same error N.B: the same command run with no issue with GATK version (1.2) My pipeline in short: mapping the paired end reads with bwa aln -q 20 ref.fa read > files.sai bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam samtools view -bS test1.sam | samtools sort - test samtools index test1.bam samtools merge -rh RG.txt test test1.bam test2.bam  RG.txt @RG ID:test1 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan @RG ID:test2 PL:Illumina PU:HWI-ST303 LB:test PI:75 SM:test CN:japan samtools index test.bam picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true VALIDATION_STRINGENCY=SILENT samtools index test_markduplicate.bam  how to add read group to the bam file using PICARD generated from GS reference mapper BAM file? Hi all, I am doing an exome analysis with BWA 0.6.1-r104, Picard 1.79 and GATK v2.2-8-gec077cd. I have paired end reads, my protocol until now is (in brief, omitting options etc.) bwa aln R1.fastq bwa aln R2.fastq bwa sampe R1.sai R2.sai picard/CleanSam.jar picard/SortSam.jar picard/MarkDuplicates.jar picard/AddOrReplaceReadGroups.jar picard/BuildBamIndex.jar GATK -T RealignerTargetCreator -known dbsnp.vcf GATK -T IndelRealigner -known dbsnp.vcf GATK -T BaseRecalibrator -knownSites dbsnp.vcf GATK -T PrintReads A closer look on the output of the above toolchain revealed changes in read counts I did not quite understand. I have 85767226 paired end = 171534452 sequences in fastQ file BWA reports this number, the cleaned SAM file has 171534452 alignments as expected. MarkDuplicates reports: Read 165619516 records. 2 pairs never matched. Marking 20272927 records as duplicates. Found 2919670 optical duplicate clusters. so nearly 6 million reads seem to miss. CreateTargets MicroScheduler reports 35915555 reads were filtered out during traversal out of 166579875 total (21.56%) -> 428072 reads (0.26% of total) failing BadMateFilter -> 16077607 reads (9.65% of total) failing DuplicateReadFilter -> 19409876 reads (11.65% of total) failing MappingQualityZeroFilter so nearly 5 million reads seem to miss The Realigner MicroScheduler reports 0 reads were filtered out during traversal out of 171551640 total (0.00%) which appears a miracle to me since 1) there are even more reads now than input sequences, 2) all those crappy reads reported by CreateTargets do not appear. From Base recalibration MicroScheduler, I get 41397379 reads were filtered out during traversal out of 171703265 total (24.11%) -> 16010068 reads (9.32% of total) failing DuplicateReadFilter -> 25387311 reads (14.79% of total) failing MappingQualityZeroFilter ..... so my reads got even more offspring, but, e.g., the duplicate reads reappear with "roughly" the same number. I found these varying counts a little irritating -- can someone please give me a hint on the logics of these numbers? And, does the protocol look meaningful? Thanks for any comments! Picard appears not to like the way BWA codes mtDNA. I am doing human exome sequencing using a copy of hg19 which I obtained from UCSC and indexed using BWA per the instructions here: ## Example 1 [Tue Aug 28 12:45:16 EDT 2012] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=125435904 FAQ: http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in ISIZE column; Line 3982 Line: FCC0CHTACXX:1:1101:14789:3170#TAGCTTAT 117 chrM 304415842 0 100M = -1610645157 2379906297 TGCGACTTGGAAGCGGATTCAGAGGACAGGACAGAACACTTGGGCAAGTGAATCTCTGTCTGTCTGTCTGTCTCATTGGTTGGTTTATTTCCATTTTCTT B@<:>CDDDBDDBDEEEEEEFEFCCHHFHHGGIIIHIGJJJIIGGGIIIIJJJIIGJIJGG@CEIFJIJJJJIJIJIJJJJIJJJGIHHGHFFEFFFCCC RG:Z:1868 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:39G45G14 XA:Z:chrM,-391302964,100M,2; at net.sf.samtools.SAMTextReader.reportFatalErrorParsingLine(SAMTextReader.java:223) at net.sf.samtools.SAMTextReader.access400(SAMTextReader.java:40)
at net.sf.samtools.SAMTextReader$RecordIterator.parseInt(SAMTextReader.java:293) at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:394)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:278) at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:250)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:641) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:619)
at net.sf.picard.sam.SortSam.doWork(SortSam.java:68)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.SortSam.main(SortSam.java:57)


## Example 2

java -jar ~/bin/picard-tools-1.74/MarkDuplicates.jar \
INPUT=1sorted.bam \
OUTPUT=1dedup.bam \
ASSUME_SORTED=true \
METRICS_FILE=metrics \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=LENIENT

...
Ignoring SAM validation error: ERROR: Record 691, Read name FCC0CHTACXX:1:1302:4748:176644#GGCTACAT, Mate Alignment start (436154938) must be <= reference sequence length (16571) on reference chrM
Ignoring SAM validation error: ERROR: Record 692, Read name FCC0CHTACXX:1:2104:8494:167812#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 693, Read name FCC0CHTACXX:1:1201:21002:183608#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 694, Read name FCC0CHTACXX:1:2303:3184:35872#GGCTACAT, Mate Alignment start (436154812) must be <= reference sequence length (16571) on reference chrM
...
`

I've truncated the output; in fact it throws such an error for every single line of mitochondrial reads.

I suspect I could solve this by writing my own script to go in and change the way one column is coded, but more broadly, I am interested in the answer to "how do you make BWA, Picard and GATK work seamlessly together without needing to do your own scripting"?