We're doing some analysis on quite big data and time is an issue, so I did a bit of scaling testing on a subset of the data before beginning. The results were unexpected.
When I run GATK RealignerTargetCreator with -nt 8 and give it 8 cores to work with, it actually takes about 2.5 times LONGER than if I just run it single-threaded. I don't mean that the user or CPU time goes up - the real, walltime goes up. In the -nt 8 case, the 8 cores would have been on a single node of our cluster with shared memory.
I tried testing on two different kinds of subsets of the data and both performed worse when multithreaded. I first tried restricting the input data by genomic region, ie just analysing chr22. When multithreading didn't seem to be working as expected in this test, I thought that maybe GATK was trying to parallelise over genomic regions, so I instead tried testing on a single lane of input data (a 9.6G bam file spread over the whole genome). This also ran more slowly when multithreaded.
So my question is: should I use -nt 8 in my real analysis even though it was a bad option in testing? Is it possible that multithreading will be bad for small amounts of data, but good in the large-data case? Or, does this indicate that I'm doing something wrong when trying to run RealignerTargetCreator multithreaded?
I really would like to use the fastest option for the real data as it will be very big. Any help much appreciated.