I would like to use PacBio reads to phase variants in a VCF file. ReadBackedPhasing with default parameters puts every variant in a separate block, even those obviously connected by a read. I have started to play with the parameters, and decreasing
--cacheWindowSize helps a bit, but phased blocks still contain only a couple of variants. Before trying out all possible parameter combinations, I would like to ask: Is there a recommended set of parameters for phasing from PacBio reads?
Alternatively, can I use the HaplotypeCaller somehow only for phasing? I would like to re-use the input VCF since it was created from Illumina reads and therefore contains variants of good quality. The PacBio reads, on the other hand, have quite a low coverage (10x - 20x). They should be usable for phasing, but re-calling variants from them would decrease quality.
We have PacBio data where we want to do variant calling. I tried both UnifiedGenotyper and Haplotype caller. I was not very successfull doing that. When I used UnifiedGenotyper I got some output, but just SNPs but NO indels... (I skipped the realigment part there). I tried to play arround with the parameters (indelGapContinuationPenalty, indGapOpenPenalty, min_base_quality_score). Setting the "min_base_quality_score" to a lower value is giving at least this output with only SNPs as mentioned above.
The only manual for PacBio data on GATK I got was this: https://www.broadinstitute.org/gatk/guide/topic?name=methods But this document is pretty old. Are there newer developments regarding GATK for PacBio? Or any more detailed tutorials?
Or would you suggest PBHoney as the right tool to use? Anything else?
Just to mention: For Illumina your toolkit worked like a charm! So basically we are able to work with it...