Downsampling Experiment
Posted in Ask the team | Last updated on

Comments (6)


Trying to downsample in an orderly fashion in the name of experimentation, and in doing so would like to specify just one chromosome for the experiment - so I picked chromosome 17 with -L and a coverage of 30x with -dcov 30. This came up:

ERROR MESSAGE: Locus-based traversals (ie., Locus and ActiveRegion walkers) require a minimum -dcov value of 200 when downsampling to coverage. Values less than this can produce problematic downsampling artifacts while providing only insignificant improvements in memory usage in most cases.

I was hoping to poke through results from using the HaplotypeCaller with many different simulated depths of coverage for several samples. I read that one can use -dfrac instead, and that it might even be more appropriate, though I was hoping to find out what level of coverage led to what level of results and using -dfrac feels much less specific as it appears to toss a fraction of however many reads where at a given position, rather then tossing reads over a certain coverage. Thus with -dfrac, I could say that my sample had an average of 30x for this chromosome and I tossed half so theoretically I've simulated 15x depth of coverage...

Which approach would be more representative of reality? Using -dfrac to simulate a certain depth of coverage, or -dcov assuming I didn't have the 200 restriction?

Thanks for any help/discussion! -Tristan

Return to top Comment on this article in the forum