Tagged with #bwa
3 documentation articles | 0 announcements | 5 forum discussions


Comments (0)

This article is part of the Best Practices workflow document. See http://www.broadinstitute.org/gatk/guide/best-practices for the full workflow.

The Best Practices variant discovery workflow depends on having sequence data in the form of reads that are aligned to a reference genome. So the very first step is of course to map your reads to the reference to produce a file in SAM/BAM format. We recommend using BWA, but depending on your data and how it was sequenced, you may need to use a different aligner. Once you have mapped the reads, you'll need to make sure they are sorted in the proper order (by coordinate).

Then you can proceed to mark duplicates. The rationale here is that during the sequencing process, the same DNA molecules can be sequenced several times. The resulting duplicate reads are not informative and should not be counted as additional evidence for or against a putative variant. The duplicate marking process (sometimes called **dedupping** in bioinformatics slang) identifies these reads as such so that the GATK tools know they should ignore them.

These steps are performed with tools such as Samtools and Picard that are not part of GATK, so we don't provide detailed documentation of all the options available. For more details, please see those tools' respective documentations.

Comments (15)

Objective

Map the read data to the reference and mark duplicates.

Prerequisites

  • TBD

Steps

  1. Identify read group information
  2. Generate a SAM file containing aligned reads
  3. Convert to BAM, sort and mark duplicates

1. Identify read group information

The read group information is key for downstream GATK functionality. The GATK will not work without a read group tag. Make sure to enter as much metadata as you know about your data in the read group fields provided. For more information about all the possible fields in the @RG tag, take a look at the SAM specification.

Action

Compose the read group identifier in the following format:

@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1\tPU:unit1 

where the \t stands for the tab character.


2. Generate a SAM file containing aligned reads

Action

Run the following BWA command:

In this command, replace read group info by the read group identifier composed in the previous step.

bwa mem -M -R ’<read group info>’ -p reference.fa raw_reads.fq > aligned_reads.sam 

replacing the <read group info> bit with the read group identifier you composed at the previous step.

The -M flag causes BWA to mark shorter split hits as secondary (essential for Picard compatibility).

Expected Result

This creates a file called aligned_reads.sam containing the aligned reads from all input files, combined, annotated and aligned to the same reference.

Note that here we are using a command that is specific for pair ended data in an interleaved fastq file, which is what we are providing to you as a tutorial file. To map other types of datasets (e.g. single-ended or pair-ended in forward/reverse read files) you will need to adapt the command accordingly. Please see the BWA documentation for exact usage and more options for these commands.


3. Convert to BAM, sort and mark duplicates

These initial pre-processing operations format the data to suit the requirements of the GATK tools.

Action

Run the following Picard command to sort the SAM file and convert it to BAM:

java -jar SortSam.jar \ 
    INPUT=aligned_reads.sam \ 
    OUTPUT=sorted_reads.bam \ 
    SORT_ORDER=coordinate 

Expected Results

This creates a file called sorted_reads.bam containing the aligned reads sorted by coordinate.

Action

Run the following Picard command to mark duplicates:

java -jar MarkDuplicates.jar \ 
    INPUT=sorted_reads.bam \ 
    OUTPUT=dedup_reads.bam \
    METRICS_FILE=metrics.txt

Expected Result

This creates a sorted BAM file called dedup_reads.bam with the same content as the input file, except that any duplicate reads are marked as such. It also produces a metrics file called metrics.txt containing (can you guess?) metrics.

Action

Run the following Picard command to index the BAM file:

java -jar BuildBamIndex.jar \ 
    INPUT=dedup_reads.bam 

Expected Result

This creates an index file for the BAM file called dedup_reads.bai.

Comments (4)

Sting BWA/C Bindings

WARNING: This tool is experimental and unsupported and just starting to be developed and used and should be considered a beta version. Feel free to report bugs but we are not supporting the tool

The GSA group has made bindings available for Heng Li's Burrows-Wheeler Aligner (BWA). Our aligner bindings present additional functionality to the user not traditionally available with BWA. BWA standalone is optimized to do fast, low-memory alignments from Fastq to BAM. While our bindings aim to provide support for reasonably fast, reasonably low memory alignment, we add the capacity to do exploratory data analyses. The bindings can provide all alignments for a given read, allowing a user to walk over the alignments and see information not typically provided in the BAM format. Users of the bindings can 'go deep', selectively relaxing alignment parameters one read at a time, looking for the best alignments at a site.

The BWA/C bindings should be thought of as alpha release quality. However, we aim to be particularly responsive to issues in the bindings as they arise. Because of the bindings' alpha state, some functionality is limited; see the Limitations section below for more details on what features are currently supported.

Contents

A note about using the bindings

Whenever native code is called from Java, the user must assist Java in finding the proper shared library. Java looks for shared libraries in two places, on the system-wide library search path and through Java properties invoked on the command line. To add libbwa.so to the global library search path, add the following to your .my.bashrc, .my.cshrc, or other startup file:

bash
export LD_LIBRARY_PATH=/humgen/gsa-scr1/GATK_Data/bwa/stable:$LD_LIBRARY_PATH
csh
setenv LD_LIBRARY_PATH /humgen/gsa-scr1/GATK_Data/bwa/stable:$LD_LIBRARY_PATH

To specify the location of libbwa.so directly on the command-line, use the java.library.path system property as follows:

java -Djava.library.path=/humgen/gsa-scr1/GATK_Data/bwa/stable \
    -jar dist/GenomeAnalysisTK.jar \
    -T AlignmentValidation \
    -I /humgen/gsa-hphome1/hanna/reference/1kg/NA12878_Pilot1_20.bwa.bam \
    -R /humgen/gsa-scr1/GATK_Data/bwa/human_b36_both.fasta

Preparing to use the aligner

Within the Broad Institute

We provide internally accessible versions of both the BWA shared library and precomputed BWA indices for two commonly used human references at the Broad (Homo_sapiens_assembly18.fasta and human_b36_both.fasta). These files live in the following directory:

/humgen/gsa-scr1/GATK_Data/bwa/stable

Outside of the Broad Institute

Two steps are required in preparing to use the aligner: building the shared library and using BWA/C to generate an index of the reference sequence.

The Java bindings to the aligner are available through the Sting repository. A precompiled version of the bindings are available for Linux; these bindings are available in c/bwa/libbwa.so.1. To build the aligner from source:

  • Fetch the latest svn of BWA from SourceForge. Configure and build BWA.
sh autogen.sh
./configure
make
  • Download the latest version of Sting from our Github repository.
  • Customize the variables at the top one of the build scripts (c/bwa/build_linux.sh,c/bwa/build_mac.sh) based on your environment. Run the build script.

To build a reference sequence, use the BWA C executable directly:

bwa index -a bwtsw <your reference sequence>.fasta

Using the existing GATK alignment walkers

Two walkers are provided for end users of the GATK. The first of the stock walkers is Align, which can align an unmapped BAM file or realign a mapped BAM file.

java \
-Djava.library.path=/humgen/gsa-scr1/GATK_Data/bwa/stable \
    -jar dist/GenomeAnalysisTK.jar \
    -T Align \
    -I NA12878_Pilot1_20.unmapped.bam \
    -R /humgen/gsa-scr1/GATK_Data/bwa/human_b36_both.fasta \
    -U \
    -ob human.unsorted.bam

Most of the available parameters here are standard GATK. -T specifies that the alignment analysis should be used; -I specifies the unmapped BAM file to align, and the -R specifies the reference to which to align. By default, this walker assumes that the bwa index support files will live alongside the reference. If these files are stored elsewhere, the optional -BWT argument can be used to specify their location. By defaults, alignments will be emitted to the console in SAM format. Alignments can be spooled to disk in SAM format using the -o option or spooled to disk in BAM format using the -ob option.

The other stock walker is AlignmentValidation, which computes all possible alignments based on the BWA default configuration settings and makes sure at least one of the top alignments matches the alignment stored in the read.

java \
-Djava.library.path=/humgen/gsa-scr1/GATK_Data/bwa/stable \
    -jar dist/GenomeAnalysisTK.jar \
    -T AlignmentValidation \
    -I /humgen/gsa-hphome1/hanna/reference/1kg/NA12878_Pilot1_20.bwa.bam \
    -R /humgen/gsa-scr1/GATK_Data/bwa/human_b36_both.fasta

Options for the AlignmentValidation walker are identical to the Alignment walker, except the AlignmentValidation walker's only output is a exception if validation fails.

Another sample walker of limited scope, CountBestAlignmentsWalker, is available for review; it is discussed in the example section below.

Writing new GATK walkers utilizing alignment bindings

BWA/C can be created on-the-fly using the org.broadinstitute.sting.alignment.bwa.c.BWACAligner constructor. The bindings have two sets of interfaces: an interface which returns all possible alignments and an interface which randomly selects an alignment from a list of the top scoring alignments as selected by BWA.

To iterate through all functions, use the following method:

    /**
     * Get a iterator of alignments, batched by mapping quality.
     * @param bases List of bases.
     * @return Iterator to alignments.
     */
    public Iterable<Alignment[]> getAllAlignments(final byte[] bases);

The call will return an Iterable which batches alignments by score. Each call to next() on the provided iterator will return all Alignments of a given score, ordered in best to worst. For example, given a read sequence with at least one match on the genome, the first call to next() will supply all exact matches, and subsequent calls to next() will give alignments judged to be inferior by BWA (alignments containing mismatches, gap opens, or gap extensions).

Alignments can be transformed to reads using the following static method in org.broadinstitute.sting.alignment.Alignment:

    /**
     * Creates a read directly from an alignment.
     * @param alignment The alignment to convert to a read.
     * @param unmappedRead Source of the unmapped read.  Should have bases, quality scores, and flags.
     * @param newSAMHeader The new SAM header to use in creating this read.  Can be null, but if so, the sequence
     *                     dictionary in the
     * @return A mapped alignment.
     */
    public static SAMRecord convertToRead(Alignment alignment, SAMRecord unmappedRead, SAMFileHeader newSAMHeader);

A convenience method is available which allows the user to get SAMRecords directly from the aligner.

    /**
     * Get a iterator of aligned reads, batched by mapping quality.
     * @param read Read to align.
     * @param newHeader Optional new header to use when aligning the read.  If present, it must be null.
     * @return Iterator to alignments.
     */
    public Iterable<SAMRecord[]> alignAll(final SAMRecord read, final SAMFileHeader newHeader);

To return a single read randomly selected by the bindings, use one of the following methods:

    /**
     * Allow the aligner to choose one alignment randomly from the pile of best alignments.
     * @param bases Bases to align.
     * @return An align
     */
    public Alignment getBestAlignment(final byte[] bases);

    /**
     * Align the read to the reference.
     * @param read Read to align.
     * @param header Optional header to drop in place.
     * @return A list of the alignments.
     */
    public SAMRecord align(final SAMRecord read, final SAMFileHeader header);

The org.broadinstitute.sting.alignment.bwa.BWAConfiguration argument allows the user to specify parameters normally specified to 'bwt aln'. Available parameters are:

  • Maximum edit distance (-n)
  • Maximum gap opens (-o)
  • Maximum gap extensions (-e)
  • Disallow an indel within INT bp towards the ends (-i)
  • Mismatch penalty (-M)
  • Gap open penalty (-O)
  • Gap extension penalty (-E)

Settings must be supplied to the constructor; leaving any BWAConfiguration field unset means that BWA should use its default value for that argument. Configuration settings can be updated at any time using the BWACAligner updateConfiguration method.

    public void updateConfiguration(BWAConfiguration configuration);

Running the aligner outside of the GATK

The BWA/C bindings were written with running outside of the GATK in mind, but this workflow has never been tested. If you would like to run the bindings outside of the GATK, you will need:

  • The BWA shared object, libbwa.so.1
  • The packaged version of Aligner.jar

To build the packaged version of the aligner, run the following command

cp $STING_HOME/lib/bcel-*.jar ~/.ant/lib
ant package -Dexecutable=Aligner

This command will extract all classes required to run the aligner and place them in $STING_HOME/dist/packages/Aligner/Aligner.jar. You can then specify this one jar in your project's dependencies.

Limitations

The BWA/C bindings are currently in an alpha state, but they are extensively supported. Because of the bindings' alpha state, some functionality is limited. The limitations of these bindings include:

  • Only single-end alignment is supported. However, a paired end module could be implemented as a simple extension that finds the jointly optimal placement of both singly-aligned ends.
  • Color space alignments are not currently supported.
  • Only a limited number of parameters BWA's extensive parameter list are supported. The current list of supported parameters is specified in the 'Writing new GATK walkers utilizing alignment bindings' section below.
  • The system is not as heavily memory-optimized as the BWA/C implementation standalone. The JVM, by default, uses slightly over 4G of resident memory when running BWA on human. We have not done extensive testing on the behavior of the BWA/C bindings under memory pressure.
  • There is a slight negative impact on performance when using the BWA/C bindings. BWA/C standalone on 6.9M reads of human data takes roughly 45min to run 'bwa aln', 5min to run 'bwa samse', and another 1.5min to convert the resulting SAM file to a BAM. Aligning the same dataset using the Java bindings takes approximately 55 minutes.
  • The GATK requires that its input BAMs be sorted and indexed. Before using the Align or AlignmentValidation walker, you must sort and index your unmapped BAM file. Note that this is a limitation of the GATK, not the aligner itself. Using the alignment support files outside of the GATK will eliminate this requirement.

Example: analysis of alignments with the BWA bindings

In order to validate that the Java bindings were computing the same number of reads as BWA/C standalone, we modified the BWA source to gather the number of equally scoring alignments and the frequency of the number of equally scoring alignments. We then implemented the same using a walker written in the GATK. We computed this distribution over a set of 36bp human reads and found the distributions to be identical.

The relevant parts of the walker follow.

public class CountBestAlignmentsWalker extends ReadWalker<Integer,Integer> {
    /**
     * The supporting BWT index generated using BWT.
     */
    @Argument(fullName="BWTPrefix",shortName="BWT",doc="Index files generated by bwa index -d bwtsw",required=false)
    String prefix = null;

    /**
     * The actual aligner.
     */
    private Aligner aligner = null;

    private SortedMap<Integer,Integer> alignmentFrequencies = new TreeMap<Integer,Integer>();

    /**
     * Create an aligner object.  The aligner object will load and hold the BWT until close() is called.
     */
    @Override
    public void initialize() {
        BWTFiles bwtFiles = new BWTFiles(prefix);
        BWAConfiguration configuration = new BWAConfiguration();
        aligner = new BWACAligner(bwtFiles,configuration);
    }

    /**
     * Aligns a read to the given reference.
     * @param ref Reference over the read.  Read will most likely be unmapped, so ref will be null.
     * @param read Read to align.
     * @return Number of alignments found for this read.
     */
    @Override
    public Integer map(char[] ref, SAMRecord read) {
        Iterator<Alignment[]> alignmentIterator = aligner.getAllAlignments(read.getReadBases()).iterator();
        if(alignmentIterator.hasNext()) {
            int numAlignments = alignmentIterator.next().length;
            if(alignmentFrequencies.containsKey(numAlignments))
                alignmentFrequencies.put(numAlignments,alignmentFrequencies.get(numAlignments)+1);
            else
                alignmentFrequencies.put(numAlignments,1);
        }
        return 1;
    }    

    /**
     * Initial value for reduce.  In this case, validated reads will be counted.
     * @return 0, indicating no reads yet validated.
     */
    @Override
    public Integer reduceInit() { return 0; }

    /**
     * Calculates the number of reads processed.
     * @param value Number of reads processed by this map.
     * @param sum Number of reads processed before this map.
     * @return Number of reads processed up to and including this map.
     */
    @Override
    public Integer reduce(Integer value, Integer sum) {
        return value + sum;
    }

    /**
     * Cleanup.
     * @param result Number of reads processed.
     */
    @Override
    public void onTraversalDone(Integer result) {
        aligner.close();
        for(Map.Entry<Integer,Integer> alignmentFrequency: alignmentFrequencies.entrySet())
            out.printf("%d\t%d%n", alignmentFrequency.getKey(), alignmentFrequency.getValue());
        super.onTraversalDone(result);
    }
}

This walker can be run within the svn version of the GATK using -T CountBestAlignments.

The resulting placement count frequency is shown in the graph below. The number of placements clearly follows an exponential.

Bwa dist.png

Validation methods

Two major techniques were used to validate the Java bindings against the current BWA implementation.

  • Fastq files from E coli and from NA12878 chr20 were aligned using BWA standalone with BWA's default settings. The aligned SAM files were sorted, indexed, and fed into the alignment validation walker. The alignment validation walker verified that one of the top scoring matches from the BWA bindings matched the alignment produced by BWA standalone.
  • Fastq files from E coli and from NA12878 chr20 were aligned using the GATK Align walker, then fed back into the GATK's alignment validation walker.
  • The distribution of the alignment frequency was compared between BWA standalone and the Java bindings and was found to be identical.

As an ongoing validation strategy, we will use the GATK integration test suite to align a small unmapped BAM file with human data. The contents of the unmapped BAM file will be aligned and written to disk. The md5 of the resulting file will be calculated and compared to a known good md5.

Unsupported: using the BWA/C bindings from within Matlab

Some users are attempting to use the BWA/C bindings from within Matlab. To run the GATK within Matlab, you'll need to add libbwa.so to your library path through the librarypath.txt file. The librarypath.txt file normally lives in $matlabroot/toolbox/local. Within the Broad Institute, the $matlabroot/toolbox/local/librarypath.txt file is shared; therefore, you'll have to create a librarypath.txt file in your working directory from which you execute matlab.

##
## FILE: librarypath.txt
##
## Entries:
##    o path_to_jnifile
##    o [alpha,glnx86,sol2,unix,win32,mac]=path_to_jnifile
##    o $matlabroot/path_to_jnifile
##    o $jre_home/path_to_jnifile
##
$matlabroot/bin/$arch
/humgen/gsa-scr1/GATK_Data/bwa/stable

Once you've edited the library path, you can verify that Matlab has picked up your modified file by running the following command:

>> java.lang.System.getProperty('java.library.path')
     
ans =
/broad/tools/apps/matlab2009b/bin/glnxa64:/humgen/gsa-scr1/GATK_Data/bwa/stable

Once the location of libbwa.so has been added to the library path, you can use the BWACAligner just as you would any other Java class in Matlab:

>> javaclasspath({'/humgen/gsa-scr1/hanna/src/Sting/dist/packages/Aligner/Aligner.jar'})
>> import org.broadinstitute.sting.alignment.bwa.BWTFiles
>> import org.broadinstitute.sting.alignment.bwa.BWAConfiguration
>> import org.broadinstitute.sting.alignment.bwa.c.BWACAligner
>> x = BWACAligner(BWTFiles('/humgen/gsa-scr1/GATK_Data/bwa/Homo_sapiens_assembly18.fasta'),BWAConfiguration())
>> y=x.getAllAlignments(uint8('CCAATAACCAAGGCTGTTAGGTATTTTATCAGCAATGTGGGATAAGCAC'));

We don't have the resources to directly support using the BWA/C bindings from within Matlab, but if you report problems to us, we will try to address them.

No posts found with the requested search criteria.
Comments (1)

Hi all,

My question is on bwa software when one want to map RNA-seq data on the entire human genome. What should be the specific settings to use to get maximum mapping? Should it be effective if no options are used in the command line?

Thank you for your time

Comments (1)

We have used bwa 0.7.4 aln and sampe to align illumina reads. Then used the following command java -Xmx6g -jar ~/GenomeAnalysisTK-2.8-1/GenomeAnalysisTK.jar -T BaseRecalibrator -I ~/temp/BIR-08_130330_I288_FCD1P68ACXX_L7_SZAIPI025187-74.sortedindelrealigned.bam -R ~/hg19/ucsc.hg19.fasta -knownSites ~/dbSNP/dbsnp_137.hg19.vcf -o ~/BIR-08_130330_I288_FCD1P68ACXX_L7_SZAIPI025187-74.sortedBQSR.grp Which gave the following error message

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: START (90) > (89) STOP -- this should never happen, please check read: FCD1P68ACXX:7:1315:19572:52424#CGCGGTGA 1/2 90b aligned read. (CIGAR: 85M4I1M2D) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinates(ReadClipper.java:537) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipByReferenceCoordinatesRightTail(ReadClipper.java:193) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipAdaptorSequence(ReadClipper.java:389) at org.broadinstitute.sting.utils.clipping.ReadClipper.hardClipAdaptorSequence(ReadClipper.java:392) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:245) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:132) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:313) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.8-1-g932cd3a):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: START (90) > (89) STOP -- this should never happen, please check read: FCD1P68ACXX:7:1315:19572:52424#CGCGGTGA 1/2 90b aligned read. (CIGAR: 85M4I1M2D)

can you help me in this error message? Why its coming and how to rectify it? Thanks in advance Mayukh

Comments (1)

I have the genomes of several isolates of a parasite, and I would like to investigate synonymous/non-synonymous substitution for identifying potential antigens, as well as SNPs genome-wide and I am wondering how well BWA/GATK are suited for this purpose. I've been told that BWA is only very good with sequences <2% divergent, and some of the antigens in this specie are known to be >20% divergent. However, I also know that GATK does local realignments of indels. So I would specifically like to know - is BWA/GATK good for looking at substitutions/SNPs in highly variable genes, and if not which other alignment tools are compatible and appropriate for this purpose?

Comments (1)

Hello,

before I only used BWA and as you described in the best pratice I performed the realign step. Now I want to integrate in my pipeline Stampy associated with BWA.

Do you think, I should make the realign step ?

Thanks !

Comments (1)

Picard appears not to like the way BWA codes mtDNA. I am doing human exome sequencing using a copy of hg19 which I obtained from UCSC and indexed using BWA per the instructions here:

Example 1

[Tue Aug 28 12:45:16 EDT 2012] net.sf.picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=125435904
FAQ: http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in ISIZE column; Line 3982
Line: FCC0CHTACXX:1:1101:14789:3170#TAGCTTAT 117 chrM 304415842 0 100M = -1610645157 2379906297 TGCGACTTGGAAGCGGATTCAGAGGACAGGACAGAACACTTGGGCAAGTGAATCTCTGTCTGTCTGTCTGTCTCATTGGTTGGTTTATTTCCATTTTCTT B@<:>CDDDBDDBDEEEEEEFEFCCHHFHHGGIIIHIGJJJIIGGGIIIIJJJIIGJIJGG@CEIFJIJJJJIJIJIJJJJIJJJGIHHGHFFEFFFCCC RG:Z:1868 XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:39G45G14 XA:Z:chrM,-391302964,100M,2;
at net.sf.samtools.SAMTextReader.reportFatalErrorParsingLine(SAMTextReader.java:223)
at net.sf.samtools.SAMTextReader.access$400(SAMTextReader.java:40)
at net.sf.samtools.SAMTextReader$RecordIterator.parseInt(SAMTextReader.java:293)
at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:394)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:278)
at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:250)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:641)
at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:619)
at net.sf.picard.sam.SortSam.doWork(SortSam.java:68)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.SortSam.main(SortSam.java:57)

Example 2

java -jar ~/bin/picard-tools-1.74/MarkDuplicates.jar \
INPUT=1sorted.bam \
OUTPUT=1dedup.bam \
ASSUME_SORTED=true \
METRICS_FILE=metrics \
CREATE_INDEX=true \
VALIDATION_STRINGENCY=LENIENT

...
Ignoring SAM validation error: ERROR: Record 691, Read name FCC0CHTACXX:1:1302:4748:176644#GGCTACAT, Mate Alignment start (436154938) must be <= reference sequence length (16571) on reference chrM
Ignoring SAM validation error: ERROR: Record 692, Read name FCC0CHTACXX:1:2104:8494:167812#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 693, Read name FCC0CHTACXX:1:1201:21002:183608#GGCTACAT, Mate Alignment start should != 0 because reference name != *.
Ignoring SAM validation error: ERROR: Record 694, Read name FCC0CHTACXX:1:2303:3184:35872#GGCTACAT, Mate Alignment start (436154812) must be <= reference sequence length (16571) on reference chrM
...

I've truncated the output; in fact it throws such an error for every single line of mitochondrial reads.

I suspect I could solve this by writing my own script to go in and change the way one column is coded, but more broadly, I am interested in the answer to "how do you make BWA, Picard and GATK work seamlessly together without needing to do your own scripting"?