I was surprised to notice that my phased VCFs produced by both phasing tools (alone or in succession) contained about 1% (PBT) or 2% (RBP) less variants than the input files; this was reproducible (PBT and RBP), and occurred with and without the -mvf option (PBT). A quick scan (
SelectVariants --discordance; thanks for providing that one ;-) indicates that the missing variants are all indels (mostly insertions, from 2-20 nt); note that I haven't tested whether the phased output file lacks all indels present in the input file.
Is this the expected behaviour of both tools or am I doing something terribly wrong?
If yes, is there an option to emit these variants together with the (phased and unphased variants) in the file specified with -o (I know I could use SelectVariants --discordance to add these back in a subsequent step, but there may be a more elegant solution)?
Sorry for the trouble, can't even remember why I counted before and after (but glad I did... )
[GATK 2.6-5] -T PhaseByTransmission -R ../human_g1k_v37_decoy.fasta -V IN.vcf -ped Trio1.ped -o OUT.vcf -mvf MV.vcf -pedValidationType SILENT