I'm aware HLACaller is no longer technically supported, but I have a question related to some of the issues pertaining to the HLACaller algorithm on whole genome sequencing data. As is noted in the readme, the developers suggest using a -minFreq option to reduce rare HLA haplotypes from being spuriously called.
While that is entirely sensible, I was hoping someone could lend me some insight, suggestions, or help point me to some references that would elucidate which rare HLA alleles tend to show up frequently as false positives etc.? The reason I ask is that I'm working on a large project with cohorts of african ancestry, so I am apprehensive to entirely exclude "rare" alleles (which are likely rare European but not necessarily African alleles). I am currently planning on calling the alleles with the minFreq option in a first round, then scanning for individuals with potential calling errors and redoing them as a batch without the option in place.