Tagged with #mapping
1 documentation article | 0 announcements | 5 forum discussions



Created 2016-02-11 16:20:11 | Updated 2016-02-17 05:16:37 | Tags: mapping unmapped mate-pair

Comments (0)

Mate unmapped records are identifiable using the 8 SAM flag.

It is possible for a BAM to have multiple types of mate-unmapped records. These mate unmapped records are distinct from mate missing records, where the mate is altogether absent from the BAM. Of the three types of mate unmapped records listed below, we describe only the first two in this dictionary entry.

  1. Singly mapping pair.
  2. A secondary/supplementary record is flagged as mate-unmapped but the mate is in fact mapped.
  3. Both reads in a pair are unmapped.

(1) Singly mapping pair

A mapped read's unmapped mate is marked in their SAM record in an unexpected manner that allow the pair to sort together. If you look at these unmapped reads, the alignment columns 2 and 3 indicate they align, in fact identically to the mapped mate. However, what is distinct is the asterisk * in the CIGAR field (column 6) that indicates the record is unmapped. This allows us to (i) identify the unmapped read as having passed through the aligner, and (ii) keep the pairs together in file manipulations that use either coordinate or queryname sorted BAMs. For example, when a genomic interval of reads are taken to create a new BAM, the pair remain together. For file manipulations dependent on such sorting, we can deduce that these mate unmapped records are immune to becoming missing mates.

(2) Mate unmapped record whose mate is mapped but in a pair that excludes the record

The second type of mate unmapped records apply to multimapping read sets processed through MergeBamAlignment such as in Tutorial#6483. Besides reassigning primary and secondary flags within multimapping sets according to a user specified strategy, MergeBamAlignment marks secondary records with the mate unmapped flag. Specifically, after BWA-MEM alignment, records in multimapping sets are all each mate-mapped. After going through MergeBamAlignment, the secondary records become mate-unmapped. The primary alignments remain mate-mapped. This effectively minimizes the association between secondary records from their previous mate.


How do tools treat them differently?

GATK tools typically ignore secondary/supplementary records from consideration. However, tools will process the mapped read in a singly mapping pair. For example, MarkDuplicates skips secondary records from consideration but marks duplicate singly mapping reads.

No articles to display.


Created 2016-05-27 23:06:21 | Updated | Tags: reference mapping assembly snp-calling

Comments (1)

Hi GATK team,

I am new to bioinformatics and could you help me with this problem?

I am trying to call SNPs on my polyploid samples, consisting of diploids and triplods. But I used a reference genome from a relative species that is a diploid. For this reason, I think all my SNPs only show diploid alleles even for the triploid samples. Thus I want to create my own reference genome by assembling the representative samples (mix of trploids and diploids) like shown in the GATK slide.

I have done mapping before using bwa or bowtie, but I do not know how to assemble using my own samples... This may not be the GATK problem, but could you help me how I can assemble my own reference genome please? Thank you.

Kind regards, Shane


Created 2015-06-01 10:54:07 | Updated | Tags: best-practices picard mapping

Comments (3)

I am trying to follow the best practices for mapping my (Paired-end Illumina HiSeq) reads to the reference, by following this presentation:

From what I understand, I should use MergeBamAlignment to clean up the output from bwa, and then use this cleaned up output for the rest of the analysis. However, when I run ValidateSamFile after running MergeBamAlignment I get a lot of errors, and running CleanSam on the file does not resolve any of them. What am I doing wrong? I've tried searching the web for more details about MergeBamAlignment but I haven't been able to find much. Please let me know if you require any additional information.

How I ran MergeBamAlignment picard-tools MergeBamAlignment \ UNMAPPED_BAM=unmapped_reads.sam \ ALIGNED_BAM=aligned_reads.sam \ OUTPUT=aligned_reads.merged.bam \ REFERENCE_SEQUENCE=/path/to/reference.fasta \ PAIRED_RUN=true # Why is this needed?

Error report from ValidateSamFile

HISTOGRAM java.lang.String

Error Type Count
ERROR:INVALID_CIGAR 5261
ERROR:MATES_ARE_SAME_END 30
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 30

Created 2014-12-01 18:47:12 | Updated | Tags: mapping

Comments (5)

Hello, when I use BWA to map reads generated from targeted sequencing data (Agilent SureSelect kit), how to prepare reference, better use whole genome or selected subset (targeted region) ?

Thanks !


Created 2014-01-20 19:28:37 | Updated | Tags: bwa mapping rna-seq

Comments (1)

Hi all,

My question is on bwa software when one want to map RNA-seq data on the entire human genome. What should be the specific settings to use to get maximum mapping? Should it be effective if no options are used in the command line?

Thank you for your time


Created 2012-11-29 08:16:23 | Updated | Tags: bam walker summary mapping reads

Comments (4)

Hi, Does GATK2 provide a walker/option to summarize the read alignment in a given BAM file? The summary including total reads, reads mapped/%, reads uniquely mapped/%, reads uniquely mapped with 0mm/%, reads mapped on-target/%, reads uniquely mapped on-target%, etc is of great use to assess the mapping quality for whole genome or targeted analysis. Please advice me on how I can obtain this using any of the walkers available. Thanks, Raj