No official posts found with tag ValidationAmplicons
No discussions found with tag ValidationAmplicons

ValidationAmplicons

Creates FASTA sequences for use in Seqenom or PCR utilities for site amplification and subsequent validation

Category Validation Utilities

Traversal LocusWalker

PartitionBy LOCUS


Overview

ValidationAmplicons consumes a VCF and an Interval list and produces FASTA sequences from which PCR primers or probe sequences can be designed. In addition, ValidationAmplicons uses BWA to check for specificity of tracts of bases within the output amplicon, lower-casing non-specific tracts, allows for users to provide sites to mask out, and specifies reasons why the site may fail validation (nearby variation, for example).

Input

Requires a VCF containing alleles to design amplicons towards, a VCF of variants to mask out of the amplicons, and an interval list defining the size of the amplicons around the sites to be validated

Output

Output is a FASTA-formatted file with some modifications at probe sites. For instance:

 >20:207414 INSERTION=1,VARIANT_TOO_NEAR_PROBE=1, 20_207414
 CCAACGTTAAGAAAGAGACATGCGACTGGGTgcggtggctcatgcctggaaccccagcactttgggaggccaaggtgggc[A/G*]gNNcacttgaggtcaggagtttgagaccagcctggccaacatggtgaaaccccgtctctactgaaaatacaaaagttagC
 >20:792122 Valid 20_792122
 TTTTTTTTTagatggagtctcgctcttatcgcccaggcNggagtgggtggtgtgatcttggctNactgcaacttctgcct[-/CCC*]cccaggttcaagtgattNtcctgcctcagccacctgagtagctgggattacaggcatccgccaccatgcctggctaatTT
 >20:994145 Valid 20_994145
 TCCATGGCCTCCCCCTGGCCCACGAAGTCCTCAGCCACCTCCTTCCTGGAGGGCTCAGCCAAAATCAGACTGAGGAAGAAG[AAG/-*]TGGTGGGCACCCACCTTCTGGCCTTCCTCAGCCCCTTATTCCTAGGACCAGTCCCCATCTAGGGGTCCTCACTGCCTCCC
 >20:1074230 SITE_IS_FILTERED=1, 20_1074230
 ACCTGATTACCATCAATCAGAACTCATTTCTGTTCCTATCTTCCACCCACAATTGTAATGCCTTTTCCATTTTAACCAAG[T/C*]ACTTATTATAtactatggccataacttttgcagtttgaggtatgacagcaaaaTTAGCATACATTTCATTTTCCTTCTTC
 >20:1084330 DELETION=1, 20_1084330
 CACGTTCGGcttgtgcagagcctcaaggtcatccagaggtgatAGTTTAGGGCCCTCTCAAGTCTTTCCNGTGCGCATGG[GT/AC*]CAGCCCTGGGCACCTGTNNNNNNNNNNNNNTGCTCATGGCCTTCTAGATTCCCAGGAAATGTCAGAGCTTTTCAAAGCCC
are amplicon sequences resulting from running the tool. The flags (preceding the sequence itself) can be:
 Valid                     // amplicon is valid
 SITE_IS_FILTERED=1        // validation site is not marked 'PASS' or '.' in its filter field ("you are trying to validate a filtered variant")
 VARIANT_TOO_NEAR_PROBE=1  // there is a variant too near to the variant to be validated, potentially shifting the mass-spec peak
 MULTIPLE_PROBES=1,        // multiple variants to be validated found inside the same amplicon
 DELETION=6,INSERTION=5,   // 6 deletions and 5 insertions found inside the amplicon region (from the "mask" VCF), will be potentially difficult to validate
 DELETION=1,               // deletion found inside the amplicon region, could shift mass-spec peak
 START_TOO_CLOSE,          // variant is too close to the start of the amplicon region to give sequenom a good chance to find a suitable primer
 END_TOO_CLOSE,            // variant is too close to the end of the amplicon region to give sequenom a good chance to find a suitable primer
 NO_VARIANTS_FOUND,        // no variants found within the amplicon region
 INDEL_OVERLAPS_VALIDATION_SITE, // an insertion or deletion interferes directly with the site to be validated (i.e. insertion directly preceding or postceding, or a deletion that spans the site itself)
 

Examples

    java
      -jar GenomeAnalysisTK.jar
      -T ValidationAmplicons
      -R /humgen/1kg/reference/human_g1k_v37.fasta
      -L:table interval_table.table
      -ProbeIntervals:table interval_table.table
      -ValidateAlleles:vcf sites_to_validate.vcf
      -MaskAlleles:vcf mask_sites.vcf
      --virtualPrimerSize 30
      -o probes.fasta
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by ValidationAmplicons.


Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

ValidationAmplicons specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--MaskAlleles RodBinding[VariantContext] NA A VCF containing the sites you want to MASK from the designed amplicon (e.g. by Ns or lower-cased bases)
--ProbeIntervals RodBinding[TableFeature] NA A collection of intervals in table format with optional names that represent the intervals surrounding the probe sites amplicons should be designed for
--ValidateAlleles RodBinding[VariantContext] NA A VCF containing the sites and alleles you want to validate. Restricted to *BI-Allelic* sites
Optional
--doNotUseBWA boolean false Do not use BWA, lower-case repeats only
--filterMonomorphic boolean false Monomorphic sites in the mask file will be treated as filtered
--ignoreComplexEvents boolean false Ignore complex genomic records.
--lowerCaseSNPs boolean false Lower case SNPs rather than replacing with 'N'
--onlyOutputValidAmplicons boolean false Only output valid sequences.
--out PrintStream stdout An output file created by the walker. Will overwrite contents if file exists
--target_reference File NA The reference to which reads in the source file should be aligned. Alongside this reference should sit index files generated by bwa index -d bwtsw. If unspecified, will default to the reference specified via the -R argument.
--virtualPrimerSize int 20 Size of the virtual primer to use for lower-casing regions with low specificity

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--doNotUseBWA ( boolean with default value false )

Do not use BWA, lower-case repeats only.

--filterMonomorphic ( boolean with default value false )

Monomorphic sites in the mask file will be treated as filtered.

--ignoreComplexEvents ( boolean with default value false )

Ignore complex genomic records.. If ignoreComplexEvents is true, the output fasta file will contain only sequences coming from SNPs and Indels. Complex substitutions will be ignored.

--lowerCaseSNPs ( boolean with default value false )

Lower case SNPs rather than replacing with 'N'.

--MaskAlleles ( required RodBinding[VariantContext] )

A VCF containing the sites you want to MASK from the designed amplicon (e.g. by Ns or lower-cased bases). A VCF file containing variants to be masked. A mask variant overlapping a validation site will be ignored at the validation site. --MaskAlleles binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--onlyOutputValidAmplicons ( boolean with default value false )

Only output valid sequences.. If onlyOutputValidAmplicons is true, the output fasta file will contain only valid sequences. Useful for producing delivery-ready files.

--out / -o ( PrintStream with default value stdout )

An output file created by the walker. Will overwrite contents if file exists.

--ProbeIntervals ( required RodBinding[TableFeature] )

A collection of intervals in table format with optional names that represent the intervals surrounding the probe sites amplicons should be designed for. A Table-formatted file listing amplicon contig, start, stop, and a name for the amplicon (or probe) --ProbeIntervals binds reference ordered data. This argument supports ROD files of the following types: BEDTABLE, TABLE

--target_reference / -target_ref ( File )

The reference to which reads in the source file should be aligned. Alongside this reference should sit index files generated by bwa index -d bwtsw. If unspecified, will default to the reference specified via the -R argument..

--ValidateAlleles ( required RodBinding[VariantContext] )

A VCF containing the sites and alleles you want to validate. Restricted to *BI-Allelic* sites. A VCF file containing the bi-allelic sites for validation. Filtered records will prompt a warning, and will be flagged as filtered in the output fastq. --ValidateAlleles binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3

--virtualPrimerSize ( int with default value 20 )

Size of the virtual primer to use for lower-casing regions with low specificity. BWA single-end alignment is used as a primer specificity proxy. Low-complexity regions (that don't align back to themselves as a best hit) are lowercased. This changes the size of the k-mer used for alignment.


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.