Hi, I have 190 samples that I am running through the GATK DNAseq pipeline following the Best Practices. Since only a few genes have been sequenced for each sample, the alignment files are very small (0.5 GB BAM files), but even then processing each samples takes about 3 hours. Is there a way to parallelize the processing of the individual samples on a mutli-core machine (since processing each sample is independent of each other it should not make a difference). There is a feature in Python using Pool in the multiprocessing module that could be used. I tried it but it does not seem to work for me. Does the GATK team have any guidance or information on this issue. Thanks, - Pankaj
Wouldn't it be useful to have GATK wrapped into a Python API? pygatk. As pysam is for samtools or pybedtools is for bedtools. Is anybody developing this?