No official posts found with tag CoveredByNSamplesSites
No discussions found with tag CoveredByNSamplesSites

CoveredByNSamplesSites

Print intervals file with all the variant sites for which most of the samples have good coverage

Category Diagnostics and Quality Control Tools

Traversal LocusWalker

PartitionBy LOCUS


Overview

CoveredByNSamplesSites is a GATK tool for filtering out sites based on their coverage. The sites that pass the filter are printed out to an intervals file. See argument defaults for what constitutes "most" samples and "good" coverage. These parameters can be modified from the command line.

Input

A variant file and optionally min coverage and sample percentage values.

Output

An intervals file.

Example

 java -Xmx2g -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T CoveredByNSamplesSites \
   -V input.vcf \
   -out output.intervals \
   -minCov 15
 

Additional Information

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by CoveredByNSamplesSites.

Parallelism options

This tool can be run in multi-threaded mode using this option.


Command-line Arguments

Inherited arguments

The arguments described in the entries below can be supplied to this tool to modify its behavior. For example, the -L argument directs the GATK engine restricts processing to specific genomic intervals (this is an Engine capability and is therefore available to all GATK walkers).

CoveredByNSamplesSites specific arguments

This table summarizes the command-line arguments that are specific to this tool. For details, see the list further down below the table.

Name Type Default value Summary
Required
--variant RodBinding[VariantContext] NA Input VCF file
Optional
--minCoverage int 10 only samples that have coverage bigger than minCoverage will be counted
--OutputIntervals PrintStream stdout Name of file for output intervals
--percentageOfSamples double 0.9 only sites where at least percentageOfSamples of the samples have good coverage, will be emitted

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--minCoverage / -minCov ( int with default value 10 )

only samples that have coverage bigger than minCoverage will be counted.

--OutputIntervals / -out ( PrintStream with default value stdout )

Name of file for output intervals.

--percentageOfSamples / -percentage ( double with default value 0.9 )

only sites where at least percentageOfSamples of the samples have good coverage, will be emitted.

--variant / -V ( required RodBinding[VariantContext] )

Input VCF file. Variants from this VCF file are used by this tool as input. The file must at least contain the standard VCF header lines, but can be empty (i.e., no variants are contained in the file). --variant binds reference ordered data. This argument supports ROD files of the following types: BCF2, VCF, VCF3


See also Guide Index | Technical Documentation Index | Support Forum

GATK version 2.5-2-gdb4546e built at 2013/05/01 09:32:36.