Tagged with #failure
0 documentation articles | 0 announcements | 3 forum discussions


No articles to display.

No articles to display.


Created 2016-02-29 17:49:43 | Updated | Tags: markduplicates failure recovery

Comments (12)

I had a transient file system error that occurred while several MarkDuplicates jobs were running, and I'm trying to figure out if there is anyway to recover without rerunning the jobs from scratch.

Specifically, what I see in the output for each job is that it reports ... INFO 2016-02-29 00:02:13 MarkDuplicates Written 670,000,000 records. Elapsed time: 04:52:49s. Time for last 10,000,000: 251s. Last read position: chr10:85,021,664 [Mon Feb 29 00:12:24 EST 2016] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 862.99 minutes. ... a couple irrelevant lines, then Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: .../alignments/SRR371622.07.bam ...a bunch of traceback lines

SRR371622.07.bam is one of the input files, and does exist in that directory. I've verified with the sysadmin that system logs show some sort of file system error occurred at around that time.

My command is like this inputArgs=" ... INPUT=alignments/SRR371622.07.bam ... " maxFiles=$((80*ulimit -n/100))

.../java -jar .../picard.jar MarkDuplicates \ TMP_DIR=pwd/temp_dmarked \ ${inputArgs} \ OUTPUT=alignments/XXX.dmarked.bam \ METRICS_FILE=alignments/XXX.dmarked.metrics.txt \ ASSUME_SORTED=true \ REMOVE_DUPLICATES=true \ READ_NAME_REGEX=null \ MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=${maxFiles}

Each jobs has left behind the intended output file XXX.dmarked.bam, and some jobs appear to have left behind a directory fill of temporary files in temp_dmarked.

That the last read position is reported as chr10:85,021,664 is worrisome; my reference is chimp.

So, my questions: (1) Is there any hope that the created bam files are useful? Is there any easy way to verify what's in them? (2) Failing that, is the information in the temp files useful? (3) If Picard encountered a problem reading files before it reported "MarkDuplicates done", would it report the error? My concern is that there's only a ten minute lapse between chr10:85,021,664 and "MarkDuplicates done", which, unless chr10 is the last chromosome it's processing, seems suspiciously short. (4) Am I correct that MarkDuplicates doesn't support operation on intervals? I'd like to be able to break this up into many subintervals so that when something fails I don't lose as much. But as near as I can tell MarkDuplicates doesn't support doing this.

Thanks for any help or suggestions. Bob H


Created 2016-01-20 16:11:48 | Updated | Tags: install picard failure htsjdk

Comments (13)

(If this is the wrong place to post this, please tell me where to go)

I am new to GATK, trying to build Picard because I apparently will need to build Picard sequence directories for my genomes of interest.

I have installed Java6, Java8, ant, and HTSJDK. I have downloaded Picard 2a49ee2.

Following instructions at broadinstitute.github.io/picard/building.html, I have defined JAVA6_HOME, cd'd into broadinstitute-picard-2a49ee2, and when I run ant -lib lib/ant package-commands I get this message (relevant part only) "BUILD FAILED ... Basedir .../broadinstitute-picard-2a49ee2/htsjdk does not exist"

So it seems I need to do something for the build to know where my HTSJDK lives. Or did my download fail? Am I supposed to make a symbolic link from .../broadinstitute-picard-2a49ee2/htsjdk over to someplace in my HTSJDK directory? If so, to where?

Thanks for any help, Bob H


Created 2013-02-15 09:23:37 | Updated | Tags: blip failure tmp

Comments (17)

When trying to run the UnifiedGenotyper, I keep getting the following

ERROR MESSAGE: There was a failure because temporary file /tmp/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub7040175770023361502.tmp could not be found while running the GATK with more than one thread. Possible causes for this problem include: your system's open file handle limit is too small, your output or temp directories do not have sufficient space, or just an isolated file system blip

About the suggested causes:

  • The system's open file handle limit is set to 4827982, I doubt that this is exhausted.
  • On the /tmp/ file system, there is 200GB of free space; each GATK run seems to use less that 1/1000 of that.
  • I have no idea what a file system blip would be, but apparently it occurs every time I run the UnifiedGenotyper. Any idea why this would be, and how it could be avoided?

Or could there be still a different reason for the error?

Thanks, Alex