Created 2013-02-01 03:51:56 | Updated | Tags: migration partial-files

Our cluster migrates files to tape on a schedule by file-size, which means this can happen anytime before or during a GATK program call (e.g. when the file has not been touched during a long program run). It seems to me that GATK (UnifiedGenotyper, v1.5-30-g27e7e17) is not checking if files are partial but continues with binary zeroes instead of valid data when it tries to read from the offline file. (If it were a C program, they would be looking for "open" system calls which use either of the O_NONBLOCK or O_NDELAY flags.)

GATK does not seem to throw a warning/error and the resulting file (vcf) look OK at first glance except that the DMF system seems to think that the used (offline) file was being treated suspiciously and the overall runtime is inflated.

