SequenomValidationConverter
From GSA
Contents |
Overview
SequenomValidationConverter is a tool for converting sequenom .ped files (containing genotype data) into VCF files. The VCF is annotated with information pertaining to plate quality control and by default is soft-filtered by high no-call rate or low Hardy-Weinberg probability.
Requirements
SequenomValidationConverter assumes that the identifier for each variant is in the proper format.
SequenomValidationConverter currently supports only .ped files. Please convert .bed files to their non-binary equivalent before using this tool. Here's what an example .ped file would look like:
#Family ID Individual ID Test_File_foo_|c1_p1234567_gCT Test_File_foo_|c1_p1239720_gAG Test_File_foo_|c1_p1897326_gAG Test_file_foo_|c1_p1929292_gCT Test_file_foo_|c1_p1529710_gCG TEST TS1 M T T A A A G T T C C TEST TS2 M T T A A A A T T 0 0 TEST TS3 M T T G G A G T T C C TEST TS4 M T C A A G G T C 0 0 TEST TS5 M T T A A A A C C G C TEST TS6 M T T G G A G T T G G TEST TS7 M T T G G A A T C G C TEST TS8 M T T A A A G C T 0 0 TEST TS9 M T T G G G G C T 0 0 TEST TS10 M T T A A A G C C G G
Running SequenomValidationConverter
SequenomValidationConverter takes a reference, .ped file, and an output file to write to. A simple example:
java -jar /path/to/GenomeAnalysisTK.jar \ -T SequenomValidationConverter \ -R /path/to/reference.fasta \ -B input,Plink,/path/to/sequenom.ped \ -vcf /path/to/output.vcf
Adjusting Filter Settings
SequenomValidationConverter automatically filters based on a significant Hardy-Weinberg violation score (phred-scaled) and a significant percentage of no-calls. Command-line arguments can be used to alter the default values for these tests:
--maxHardy N will set the Hardy-Weinberg violation test to filter those sites where the phred-scaled HW-score is greater than integer N. [default:20] --maxNoCall X will set the no-call based test to filter those sites with a proportion of no-calls greater than fraction X. X must be in [0,1]. [default:0.05] --maxHomVar Y will turn on a filter based on homozygous variant calls and filter those sites with a proportion of hom var calls greater than X. X must be in [0,1]. [default:disabled]
NOTE: Filtered calls WILL appear in the output VCF, but will be marked as filtered in the FILTER field of the VCF.
Sample output
> java -jar dist/GenomeAnalysisTK.jar -T SequenomValidationConverter -R /broad/1KG/reference/human_b36_both.fasta -B input,Plink,indel_validation.ped -vcf output.vcf Total number of samples assayed: 185 Total number of records processed: 152 Number of Hardy-Weinberg violations: 34 (22%) Number of no-call violations: 12 (7%) Number of homozygous variant violations: 0 (0%) Number of records passing all filters: 106 (69%) Number of passing records that are polymorphic: 98 (92%)
