Tagged with #cnv
1 documentation article | 0 announcements | 4 forum discussions



Created 2014-01-17 15:36:06 | Updated 2016-03-05 11:20:37 | Tags: snp variants symbolicallele indel cnv mnp sv

Comments (2)

The answer depends on what tool we're talking about, and whether we're considering variant discovery or variant manipulation.

Variant manipulation

GATK variant manipulation tools are able to recognize the following types of alleles:

  • SNP (single nucleotide polymorphism)
  • INDEL (insertion/deletion)
  • MIXED (combination of SNPs and indels at a single position)
  • MNP (multi-nucleotide polymorphism, e.g. a dinucleotide substitution)
  • SYMBOLIC (such as the <NON-REF> allele used in GVCFs produced by HaplotypeCaller, the * allele used to signify the presence of a spanning deletion, or undefined events like a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly)

Note that SelectVariants, the GATK tool most used for VCF subsetting operations, discriminates strictly between these categories. This means that if you use for example -selectType INDEL to pull out indels, it will only select pure INDEL records, excluding any MIXED records that might include a SNP allele in addition to the insertion or deletion alleles of interest. To include those you would have to also specify selectType MIXED in the same command.

Variant discovery

The HaplotypeCaller is a sophisticated variant caller that can call different types of variants at the same time. So in addition to SNPs and indels, it is capable of emitting mixed records by default, as well as symbolic representations for e.g. spanning deletions. It does emit physical phasing information, but in its current version, HC is not able to emit MNPs. If you would like to combine contiguous SNPs into MNPs, you will need to use the ReadBackedPhasing tool with the MNP merging function activated. See the tool documentation for details. Our older (and now deprecated) variant caller, UnifiedGenotyper, was even more limited. It only called SNPs and indels, and did so separately (even if you ran in calling mode BOTH, the program performed separate calling operations internally) so it was not able to recognize that SNPs and Indels should be emitted together as a joint record when they occur at the same site.

The general release version of GATK is currently not able to detect SVs (structural variations) or CNVs (copy number variations). However, the alpha version of GATK 4 (the next generation of GATK tools) includes tools for performing CNV (copy number variation) analysis in exome data. Let us know if you're interested in trying them out by commenting on this article in the forum.

There is also a third-party software package called GenomeSTRiP built on top of GATK that provides SV (structural variation) analysis capabilities.

No articles to display.


Created 2016-05-18 09:11:40 | Updated | Tags: fastareference exome cnv

Comments (5)

Hi, I'm trying to use ExomeCNV to detect the CNV on 2 chromosomes (13 and 17 for BRCA1 and BRCA2).

I use the manual on https://secure.genome.ucla.edu/index.php/ExomeCNV_User_Guide and for first part (the GATK part) I use the code on the instruction with the only variant on reference genome (I use hg19.fasta, is it correct?).

My output is;

INFO 13:19:22,999 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:19:23,000 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 INFO 13:19:23,001 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 13:19:23,001 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 13:19:23,003 HelpFormatter - Program Args: -T DepthOfCoverage -omitBaseOutput -omitLocusTable -R ../../../reference_genome/hg19.fasta -I ../OG040.bam -L ../../../reference_genome/exome.interval_list -o output_controllo.coverage INFO 13:19:23,005 HelpFormatter - Executing as martina@martina-X750JB on Linux 4.4.0-22-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14. INFO 13:19:23,005 HelpFormatter - Date/Time: 2016/05/17 13:19:22 INFO 13:19:23,005 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:19:23,006 HelpFormatter - -------------------------------------------------------------------------------- INFO 13:19:23,295 GenomeAnalysisEngine - Strictness is SILENT INFO 13:19:23,348 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 13:19:23,352 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 13:19:23,369 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 13:19:23,381 IntervalUtils - Processing 18624 bp from intervals INFO 13:19:23,423 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 13:19:23,442 GenomeAnalysisEngine - Done preparing for traversal INFO 13:19:23,442 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 13:19:23,442 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 13:19:23,442 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime INFO 13:19:23,443 DepthOfCoverage - Per-Locus Depth of Coverage output was omitted INFO 13:19:42,619 DepthOfCoverage - Printing summary info INFO 13:19:43,028 ProgressMeter - done 45657.0 19.0 s 7.1 m 99.9% 19.0 s 0.0 s INFO 13:19:43,028 ProgressMeter - Total runtime 19.59 secs, 0.33 min, 0.01 hours INFO 13:19:43,030 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 482973 total reads (0.00%) INFO 13:19:43,030 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 13:19:43,030 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter INFO 13:19:43,031 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 13:19:43,031 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 13:19:43,031 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter INFO 13:19:43,031 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter INFO 13:19:44,231 GATKRunReport - Uploaded run statistics report to AWS S3

Is it correct?

Then i proceed with the second part but when I try to load output.coverage.sample_interval_summary I have an error, in particular: "The line 1 doesn't have 15 elements". Where am I wrong? Thank you for the help :)


Created 2015-07-01 12:08:39 | Updated | Tags: ugcallvariants indels cnv

Comments (2)

Hi,

I notice that although UnifiedGenotyper and HaplotypeCaller identify indels, none or reported greater than a length of around 50bp. Obviously integrating this ability into the already complicated algorithms is not easy, nor are such variants especially common - but they can easily be biologically-important, as well as interfered SNP calls, e.g. if a SNP is in a heterozygous deletion.

Do you know of any callers for larger indels/CNVs, specifically for multi-sample NGS projects ? I've looked at pindel and break-dancer (and probably a few others) but they seem mostly for single samples.


Created 2014-09-30 17:10:36 | Updated | Tags: cnv

Comments (1)

Hi,

I am interested in finding copy number variation in my samples. I have looked for SNPS and INDELS with GATK UnifiedGenoTyper (still have to use it with haplotypecaller). Is there a walker to find CNV's (duplications or deletions) in GATK?

Hope to hear from you soon.

Regards Varun


Created 2013-04-30 03:21:17 | Updated 2013-04-30 03:22:58 | Tags: cnv

Comments (3)

Hello Geraldine et al,

I've a question about CNV calling, which you might or might have an answer for. We're doing a case/control analysis on two cohorts, and one of the analyses we'd like to carry out is an examination of CNV length - one thing we want to do is analyse by genome (average site above/below average, say), and by region (same general idea).

While calling SNPs and indels seems straightforward enough with UG, I wonder if you have a best practice for calling CNV - or rather, a candidate for what might become an integrated best practice? Maybe I can even help with maturation/integration.

Thanks for your relentless work on the site, tools and community, this is the place to be.