I understand the HaplotypeCaller does some local assembly and realignment. Can someone expand on the parameters used during the local assembly? What is the kmer used for the assembly graph? I would like to explore the use of digital normalization prior to SNP calling to remove PCR artifacts and this information would be helpful.
I am getting the following error. What is the minimum read size to do assembly? 50 basepair too short?
java.lang.IllegalStateException: Reads are too small for use in assembly. at org.broadinstitute.sting.gatk.walkers.haplotypecaller.DeBruijnAssembler.createDeBruijnGraphs(DeBruijnAssembler.java:139) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.DeBruijnAssembler.runLocalAssembly(DeBruijnAssembler.java:123) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:483) at org.broadinstitute.sting.gatk.walkers.haplotypecaller.HaplotypeCaller.map(HaplotypeCaller.java:132) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegion(TraverseActiveRegions.java:552) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.processActiveRegions(TraverseActiveRegions.java:512) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:244) at org.broadinstitute.sting.gatk.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.ja at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.j at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:283 at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:1 at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:24 at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:15 at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
::::::::::::::
The Unified Genotyper calls SNPs relative to the specified, publicly-available reference assembly.
How can I call SNPs (in many samples, with UG) relative to an in-house individual, which I have sequenced at high-coverage?
My current solution is to perform a de novo assembly on the in-house reference individual using e.g. Velvet, and then simply use the fasta as the reference for UG.
Can the publicly-available reference assembly still be useful here for speeding up the mapping and filling-in missing parts ?
My organism is Drosophila melanogaster.