SequenomValidationConverter

From GSA

(Redirected from PlinkToVCF)
Jump to: navigation, search

Contents

Overview

SequenomValidationConverter is a tool for converting sequenom .ped files (containing genotype data) into VCF files. The VCF is annotated with information pertaining to plate quality control and by default is soft-filtered by high no-call rate or low Hardy-Weinberg probability.

Requirements

SequenomValidationConverter assumes that the identifier for each variant is in the proper format.

SequenomValidationConverter currently supports only .ped files. Please convert .bed files to their non-binary equivalent before using this tool. Here's what an example .ped file would look like:

#Family ID	Individual ID	Test_File_foo_|c1_p1234567_gCT	Test_File_foo_|c1_p1239720_gAG	Test_File_foo_|c1_p1897326_gAG	Test_file_foo_|c1_p1929292_gCT	Test_file_foo_|c1_p1529710_gCG
TEST	TS1	M	T T	A A	A G	T T	C C
TEST	TS2	M	T T	A A	A A	T T	0 0
TEST	TS3	M	T T	G G	A G	T T	C C
TEST	TS4	M	T C	A A	G G	T C	0 0
TEST	TS5	M	T T	A A	A A	C C	G C
TEST	TS6	M	T T	G G	A G	T T	G G
TEST	TS7	M	T T	G G	A A	T C	G C
TEST	TS8	M	T T	A A	A G	C T	0 0
TEST	TS9	M	T T	G G	G G	C T	0 0
TEST	TS10	M	T T	A A	A G	C C	G G

Running SequenomValidationConverter

SequenomValidationConverter takes a reference, .ped file, and an output file to write to. A simple example:

java -jar /path/to/GenomeAnalysisTK.jar \
  -T SequenomValidationConverter \
  -R /path/to/reference.fasta \
  -B input,Plink,/path/to/sequenom.ped \
  -vcf /path/to/output.vcf

Adjusting Filter Settings

SequenomValidationConverter automatically filters based on a significant Hardy-Weinberg violation score (phred-scaled) and a significant percentage of no-calls. Command-line arguments can be used to alter the default values for these tests:

--maxHardy N      will set the Hardy-Weinberg violation test to filter those sites where the phred-scaled HW-score is greater than integer N. [default:20]
--maxNoCall X     will set the no-call based test to filter those sites with a proportion of no-calls greater than fraction X. X must be in [0,1]. [default:0.05]
--maxHomVar Y     will turn on a filter based on homozygous variant calls and filter those sites with a proportion of hom var calls greater than X. X must be in [0,1].  [default:disabled]

NOTE: Filtered calls WILL appear in the output VCF, but will be marked as filtered in the FILTER field of the VCF.

Sample output

> java -jar dist/GenomeAnalysisTK.jar -T SequenomValidationConverter -R /broad/1KG/reference/human_b36_both.fasta -B input,Plink,indel_validation.ped -vcf output.vcf

Total number of samples assayed:			185
Total number of records processed:			152
Number of Hardy-Weinberg violations:			34 (22%)
Number of no-call violations:				12 (7%)
Number of homozygous variant violations:		0 (0%)
Number of records passing all filters:			106 (69%)
Number of passing records that are polymorphic:		98 (92%)
Personal tools