ROD walkers
From GSA
ROD walkers are read-free traversals that include operate over Reference Ordered Data and the reference genome at sites where there is ROD information. They are geared for high-performance traversal of many RODs and the reference such as VariantEval and CallSetConcordance. Programmatically they are nearly identical to RefWalkers<M,T> traversals with the following few quirks.
Differences from a RefWalker
- RODWalkers are only called at sites where there is at least one non-interval ROD bound. For example, if you are exploring dbSNP and some GELI call set, the map function of a RODWalker will be invoked at all sites where there is a dbSNP record or a GELI record.
- Because of this skipping RODWalkers receive a context object where the number of reference skipped bases between map calls is provided:
nSites += context.getSkippedBases() + 1; // the skipped bases plus the current location
In order to get the final count of skipped bases at the end of an interval (or chromosome) the map function is called one last time with null ReferenceContext and RefMetaDataTracker objects. The alignment context can be accessed to get the bases skipped between the last (and final) ROD and the end of the current interval.
Example change over of VariantEval
Changing to a RODWalker is very easy -- here's the new top of VariantEval, changing the system to a RodWalker from it's old RefWalker state:
//public class VariantEvalWalker extends RefWalker<Integer, Integer> {
public class VariantEvalWalker extends RodWalker<Integer, Integer> {
The map function must now capture the number of skipped bases and protect itself from the final interval map calls:
public Integer map(RefMetaDataTracker tracker, ReferenceContext ref, AlignmentContext context) {
nMappedSites += context.getSkippedBases();
if ( ref == null ) { // we are seeing the last site
return 0;
}
nMappedSites++;
That's it.
Performance improvements
A ROD walker can be very efficient compared to a RefWalker in the situation where you have sparse RODs:
| RODWalker | RefWalker | |
|---|---|---|
| dbSNP and 1KG Pilot 2 SNP calls on chr1 | 164u (s) | 768u (s) |
| Just 1KG Pilot 2 SNP calls on chr1 | 54u (s) | 666u (s) |
