Tagged with #diagnosetargets
1 documentation article | 2 announcements | 5 forum discussions


Comments (0)

A new tool has been released!

Check out the documentation at DiagnoseTargets.

Comments (2)

GATK 2.8 was released on December 6, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history

Note that this release is relatively smaller than previous ones. We are working hard on some new tools and frameworks that we are hoping to make available to everyone for our next release.


Unified Genotyper

  • Fixed bug where indels in very long reads were sometimes being ignored and not used by the caller.

Haplotype Caller

  • Improved the indexing scheme for gVCF outputs using the reference calculation model.
  • The reference calculation model now works with reduced reads.
  • Fixed bug where an error was being generated at certain homozygous reference sites because the whole assembly graph was getting pruned away.
  • Fixed bug for homozygous reference records that aren't GVCF blocks and were being treated incorrectly.

Variant Recalibrator

  • Disable tranche plots in INDEL mode.
  • Various VQSR optimizations in both runtime and accuracy. Some particular details include: for very large whole genome datasets with over 2M variants overlapping the training data randomly downsample the training set that gets used to build; annotations are ordered by the difference in means between known and novel instead of by their standard deviation; removed the training set quality score threshold; now uses 2 gaussians by default for the negative model; numBad argument has been removed and the cutoffs are now chosen by the model itself by looking at the LOD scores.

Reduce Reads

  • Fixed bug where mapping quality was being treated as a byte instead of an int, which caused high MQs to be treated as negative.

Diagnose Targets

  • Added calculation for GC content.
  • Added an option to filter the bases based on their quality scores.

Combine Variants

  • Fixed bug where annotation values were parsed as Doubles when they should be parsed as Integers due to implicit conversion; submitted by Michael McCowan.

Select Variants

  • Changed the behavior for PL/AD fields when it encounters a record that has lost one or more alternate alleles: instead of stripping them out these fields now get fixed.

Miscellaneous

  • SplitSamFile now produces an index with the BAM.
  • Length metric updates to QualifyMissingIntervals.
  • Provide close methods to clean up resources used while creating AlignmentContexts from BAM file regions; submitted by Brad Chapman.
  • Picard jar updated to version 1.104.1628.
  • Tribble jar updated to version 1.104.1628.
  • Variant jar updated to version 1.104.1628.
Comments (4)

We have decided to continue providing and supporting DepthOfCoverage and DiagnoseTargets for the foreseeable future. Going forward, we'll try to integrate them and develop their features to address the main needs of the community. To this end we welcome your continuing feedback, so please feel free to contribute comments and ideas in this thread.

To all who took the time to tell us what you find useful about DoC and DT (and what you wish it could do), a big thank you! This is always very useful to us because it helps us identify which features are most valuable to our users.

Comments (5)

Hello,

  1. Is DiagnoseTargets counting only reads that have mapped uniquely? Is that one of the default filters?

  2. From the vcf I see that IDP - Average depth across the interval. Sum of the depth in a loci divided by interval size. LL - Number of loci for this sample, in this interval with low coverage (below the minimum coverage) but not zero ZL - Number of loci for this sample, in this interval with zero coverage.

I'm interested in the total number of reads mapped to each interval.

Is this true that IDP = #reads_in_this_interval/(LL+ZL) ? so if I want to extract #reads_in_this_interval, I can look at IDP*(LL+ZL)? I have different number of reads in each sample, so I first need to normalize it.

Thanks! Moran.

Comments (1)

Hi again,

is there an automatic way to generate an interval list for every 1kb in the reference sequence? I want to run DiagnoseTargets for every 1kb in the data.

thanks!!

Comments (2)

Hello,

Is there a way to run DiagnoseTargets with a list of bam files, instead of multiple "-I bam_file" tags?

thanks!

Comments (8)

I a m running DiagnoseTargets on a list of intervals corresponding to targeted exons. In the result file, intervals are filtered by i.e. PASS, LOW_COVERAGE, COVERAGE_GAPS or NO_READS for each sample as well as for the sample set as a whole. In the individual-sample FORMAT fields, information on TF (filter) and IDP (average sample depth across interval) are given. How come I often see a combination like this?

TF:IDP NO_READS:96.09

Filtered as NO_READS, yet average depth of 96.09 across the interval? Of course, PART of the interval may be without reads, even more than the threshold set by --coverage_status_threshold, but this is what I understand is meant by COVERAGE_GAPS. I also wondered whether the reads might all have been totally filtered out due to low quality parameters and adjusted --minimum_base_quality and --minimum_mapping_quality to 0, but NO_READS are still flagged for a number of intervals, despite IDP being far from 0. Is this a bug, or have I misunderstood something?

L. Pihlstrom

Comments (1)

Now that DepthOfCoverage is being retired in GATK 2.4, I decided to investigate DiagnoseTargets. I have some questions.

1) Are there plans to support output formats other than VCF? What was great about DOC is I could easily send the output to anyone and it could be easily read. With DT, that requires additional processing.

2) DOC provided summary for intervals as well as samples. DT only does intervals. Is there a way to get per-sample info?

3) DOC output full intervals. DT only outputs the start positions. To get the end positions, an additional step is required. Can that be adjusted?

4) DOC provided coverage info for all intervals. DT only shows covered intervals, so if an interval is not covered, it will not be listed in the output. Is there a way to output all intervals?

Sorry if I sound too critical. I am a big fan of DepthOfCoverage and am disappointed to see it go.

Thank you.