Hi, I have 190 samples that I am running through the GATK DNAseq pipeline following the Best Practices. Since only a few genes have been sequenced for each sample, the alignment files are very small (0.5 GB BAM files), but even then processing each samples takes about 3 hours. Is there a way to parallelize the processing of the individual samples on a mutli-core machine (since processing each sample is independent of each other it should not make a difference). There is a feature in Python using Pool in the multiprocessing module that could be used. I tried it but it does not seem to work for me. Does the GATK team have any guidance or information on this issue. Thanks,
Wouldn't it be useful to have GATK wrapped into a Python API? pygatk. As pysam is for samtools or pybedtools is for bedtools. Is anybody developing this?