Hello, I`m new to GATK and Queue. I understand that we can write a QScript in Queue to generate separate GATK jobs and run them on a cluster of several nodes. Can we implement GATK or Queue on google hadoop?
Hi, I am trying to run GATK tool on hadoop single node cluster. I have executed below command: hduser@ubuntu:~/apps/hadoop$ bin/hadoop jar GenomeAnalysisTK.jar -T RealignerTargetCreator -I /usr/hduser/gatkinput/exampleBAM.bam -R /usr/hduser/gatkinput/exampleFASTA.fasta -o output.list After executing the above command, I got exception which is attached in the file "'gatk error on hadoop VM.txt''. Please help me to resolve this issue.
Hello I'm a developer in Korea. Recently, I have been developed about Bioinformatics pipeline. I'm using BWA, Samtools, Picard, GATK. And then I wanna make this tool on hadoop. The reason is why Using MR is efficient to speed or memory something like that. So, I know GATK is made by MR. If so, did you test GATK on MR? In theory, that is more efficient than just GATK.
And, If GATK needs indexed and sorted SAM, with using hadoop-BAM library do I just make index and sort??
Because I am novice in Bioinformatics, this issue is too complicated to me.
e-mail : email@example.com phone : +821027266808
I try to change I/O framework rather than internal framework(existing MapReduce and so on). I did try to change other tool in I/O framework, and that was finished successfully. I think GATK also can be changed in I/O framework adding MapReduce. You said to rewrite executive.* and traversals*. Am I just rewrite that frameworks only? I think this project is related gatk.io.*. (Surely, I/O process expend all framework.)
I have been analyze framework, CommandLineGATK -> CommandLineExecutable->GenomeAnalysisEngine->OutputTracker->ArgumentSource || Storage,Stub..