Tagged with #duplicatereadfilter
0 documentation articles | 0 announcements | 8 forum discussions

No articles to display.

No articles to display.

Created 2016-04-25 15:09:02 | Updated | Tags: duplicatereadfilter drf

Comments (9)


I'm currently trying to call SNP on several samples (8 bam, 1 pseudoref) and 75.15% of my reads have failed the DuplicateReadFilter. I tried to disable this filter with "-drf DuplicateRead" in the command line but that error came up (drf not defined).

Any idea?


Quentin Jehanne

Created 2016-04-06 02:53:19 | Updated | Tags: duplicatereadfilter

Comments (4)

Hi, I am processing exome data and about 10~20% of reads failed DuplicateReadFilter during HaplotypeCaller step. For some samples, more than 70% failed this step, so I want to know usually what percentage of reads fail DuplicateReadFilter. Maybe a silly question, but these percentages increase as the depth increase?

Command and Result for HaplotypeCaller are as below.

java -Xms2G -Xmx4G -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I S1.recalrealndup.bam -L targetedregions.bed -o S1.g.vcf -ERC GVCF -nct 2

INFO 05:56:26,329 MicroScheduler - 71604429 reads were filtered out during the traversal out of approximately 94595522 total reads (75.70%) INFO 05:56:26,329 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 05:56:26,330 MicroScheduler - -> 67530996 reads (71.39% of total) failing DuplicateReadFilter INFO 05:56:26,330 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 05:56:26,330 MicroScheduler - -> 4073433 reads (4.31% of total) failing HCMappingQualityFilter INFO 05:56:26,331 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 05:56:26,331 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 05:56:26,331 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 05:56:26,331 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter INFO 05:56:30,377 GATKRunReport - Uploaded run statistics report to AWS S3

Created 2016-02-18 16:26:07 | Updated 2016-02-18 16:26:53 | Tags: duplicatereadfilter haplotypecaller

Comments (3)

Hi, I am using GATK HC to identify variants in a target region of about 23kb with very deep sequencing. I get this message during HC that 99.76% of reads failing DuplicateReadFilter. This means that a lot of my reads are being thrown out and hence I am not getting correct variant calls.

First, is GATK HC an appropriate tool to call variants in such a small region with deep sequencing (more than 100X)? Second, how can I rectify this error of 99% reads failing DuplicateReadFilter?

Thanks, Nitin

Created 2015-05-10 19:31:44 | Updated | Tags: unifiedgenotyper duplicatereadfilter

Comments (2)

Hi All,

My understanding of DuplicateReadFilter is that is removes PCR duplicates (ie: identical reads that map to the same location in the genome). Is there a way to prevent UnifiedGenotyper from using this filter?

Many of my 'duplicate' reads are not really duplicates. I have 50bp single ended yeast RNA-seq data, and 10 samples/lane results in highly over-sampled data. Removing duplicates results in an undercounting of reads at the most highly expressed genes, and therefore an increased number of sub-clonal variants at these genes (because the reads from the major clonal population are discarded, but reads of sub-clonal populations are kept because they have a different sequence). At least I think that is what is happening.


Created 2015-03-12 12:18:55 | Updated | Tags: duplicatereadfilter mappingqualityzerofilter

Comments (1)

Hello GATK team,

BaseRecalibrator applies the filters: DuplicateReadFilter MappingQualityZeroFilter I've noticed that in the bam after PrintReads, most of those reads indeed filtered out, but few of them were left - about 2% reads that were marked as dups by picard, and 4% reads with a mapping quality zero.

What exactly happens when a tool applies a filter?


Created 2015-03-03 16:20:02 | Updated 2015-03-03 16:21:10 | Tags: duplicatereadfilter notprimaryalignmentfilter

Comments (2)

I have a quick question. What is the difference between DuplicateReadFilter and NotPrimaryAlignmentFilter? The documentation for each of them is identical; i.e.

Filter out duplicate reads.

Created 2013-07-03 19:08:39 | Updated 2013-07-03 19:13:43 | Tags: duplicatereadfilter flagstat info

Comments (8)

I just noticed something odd about GATK read counts. Using a tiny test data set, I generated a BAM file with marked duplicates.

This is the output for samtools flagstat:

40000 + 0 in total (QC-passed reads + QC-failed reads)
63 + 0 duplicates
38615 + 0 mapped (96.54%:-nan%)
40000 + 0 paired in sequencing
20000 + 0 read1
20000 + 0 read2
37764 + 0 properly paired (94.41%:-nan%)
38284 + 0 with itself and mate mapped
331 + 0 singletons (0.83%:-nan%)
76 + 0 with mate mapped to a different chr
54 + 0 with mate mapped to a different chr (mapQ>=5)

This is what I get as part of GATK info stats when running RealignerTargetCreator:

INFO  14:42:05,815 MicroScheduler - 5175 reads were filtered out during traversal out of 276045 total (1.87%) 
INFO  14:42:05,816 MicroScheduler -   -> 84 reads (0.03% of total) failing BadMateFilter 
INFO  14:42:05,816 MicroScheduler -   -> 1014 reads (0.37% of total) failing DuplicateReadFilter 
INFO  14:42:05,816 MicroScheduler -   -> 4077 reads (1.48% of total) failing MappingQualityZeroFilter 

This is what I get as part of GATK info stats when running DepthOfCoverage (on the orignal BAM, not after realignment):

INFO  15:03:17,818 MicroScheduler - 2820 reads were filtered out during traversal out of 309863 total (0.91%) 
INFO  15:03:17,818 MicroScheduler -   -> 1205 reads (0.39% of total) failing DuplicateReadFilter 
INFO  15:03:17,818 MicroScheduler -   -> 1615 reads (0.52% of total) failing UnmappedReadFilter 

Why are all of these so different? Why are there much more total reads and duplicate reads for GATK stats?

Created 2012-08-23 15:27:54 | Updated 2013-01-07 19:11:47 | Tags: depthofcoverage duplicatereadfilter

Comments (3)

I was wondering if there is an option to remove duplicate reads when the coverage is determined using DepthOfCoverage from a .BAM file. Or is there an alternate way to remove the duplicate reads.

Thanks Amin