The GATK is an open source project that has greatly benefited from the contributions of outside users. The GATK team welcomes contributions from anyone who produces useful functionality in line with the goals of the toolkit. You are welcome to branch the GATK main repository and develop your own tools. Sometimes these tools may be useful to the GATK user community and you may want to make it part of the main GATK distribution. If so we ask you to follow our guidelines for submission of patches.
There are a few good GIT practices that you should follow to simplify the ultimate goal, which is, adding your changes to the main GATK repository.
You should always start your code by creating a new branch from the most recent version of the main repository with :
git checkout master (make sure you are in the master branch) git fetch && git rebase origin/master (you can substitute this line for "git pull" if you have no changes in the master branch) git checkout -b newtool (create a new branch for your new tool)
Note: If you have submitted a patch to the group, do not continue development on the same branch as we cannot guarantee that your changes will make it to the main repository unchanged.
Every time before you rebase, you have to update your copy of the main repository. To do this use:
If you are just trying to keep up with the changes in the main repository after a fetch, you can rebase your branch at anytime using (and this should be all you need to do):
git rebase origin/master
In case there are conflicts, resolve them as you would and do:
git rebase --continue
If you don't know how to resolve the conflicts, you can always safely abort the whole process and go back to your branch before you started rebasing:
git rebase --abort
If you are done and want to generate your patches conforming to the latest repository changes, to edit, squash and reorder your commits use :
git rebase -i origin/master
At the prompt, you can follow the instructions to squash, edit and reorder accordingly. You can also do this step from IntelliJ with a visual editor that allows you to select what to edit/squash/reorder. You can also take a look at this nice tutorial on how to use interactive rebase.
It is okay to have a list of commits (numbered) somewhat like this in your local tree:
Before you can send your tools to us, you have to organize these commits so they tell a meaningful history and are self contained. To achieve this you will need to rebase so you can squash, edit and reorder your commits. This tree makes a lot of sense for your development process, but it makes no sense in the main repository history as it becomes hard to pick/revert commits and understand the history at a glance. After rebasing, you should edit your commits to look like this:
Use your commit messages wisely to help quick processing of your patches. Make sure the first line of your commit messages have less than 50 characters (title). Add a blank line and write a paragraph or more explaining what this commit represents (now that it is a package of multiple commits. It is important to have the 50 char title because this is all we see when we look at an extended history to find bugs and it is also our quick access to remember what the commit does to the repository.
A patch should be self contained. Meaning if we decide to adopt feature X and Z but not Y, we should be able to do so by only applying patches 1 and 2. If your patches are co-dependent, you should say so in the commits and justify why you didn't squash the commits together into one tool.
To generate patches, use :
git format-patch since
The since parameter is the last commit you want to generate patches from, for example: HEAD^3 will generate patches for HEAD^2, HEAD^1 and HEAD. You can also specify the commit by its id or by using the head of a branch. This is where using branches will make your life easier. If master is always up to date with the main repo with no changes, you can do:
git format-patch master (provided your master is up to date)
This will generate a patch for each commit you've created and you can simply e-mail them as an attachment to us.
This patch release fixes several issues that we felt were annoying enough to warrant an interim version release; see detailed list below.
Oh, and in case you're curious about the patch number, the *-46 numeral refers to the number of code commits made since the 3.4-0 release. Out of this, about a dozen are related to the bug fixes listed above; another dozen relate to new work that is not yet publicly available (gotta wait 'til 3.5...) and the rest are automated commits produced when merging completed work into the master codebase.
In version 3.4, we introduced some functionality to handle cases where a site that is variant in one sample is covered by a deletion in another sample (see release notes for details). Some of that functionality involved using a new symbolic
<*:DEL> notation to represent this in multi-sample VCFs. Unfortunately we later realized that this was not the best way to represent this, so we have now switched to a more spec-compliant notation, which is simply
*. We've also added some refinements to correctly handle cases where alleles need to be remapped, and/or where there are multiple overlapping spanning deletions in the population (which we encountered twice in a population of ~10K samples, to give you an idea of how rarely that happens in humans). The new code is fully backward-compatible, so if you have any files that contain the
<*:DEL> notation, these will be understood correctly by GATK. Note however that they will not be converted to the new notation -- they will just be passed through unchanged.
Some annoying gcc compiler version issues caused many of you to be unable to utilize Intel's accelerated PairHMM library to speed up HaplotypeCaller. We've sorted those out so now everyone should be able to utilize it provided their computing platform is running reasonably recent (i.e. not Jurassic-era) software.
There was a minor bug in the ReadBackedPhasing tool that was causing some variants to be merged into MNPs when they shouldn't be; that's now fixed.
A user reported a dramatic slowdown in the DepthOfCoverage and shortly thereafter, provided a code patch that resolved to problem. Self-fixing bug reports are our favorite kind, thank you @mnw21cam!
OK, technically not a bug fix, so this shouldn't be in a patch, according to my esteemed software engineer colleagues, but I couldn't wait to share it with y'all. This is something people often ask how to do and until now it was beyond painful. But hark! You can now tell SelectVariants to filter out variants where a given number or proportion of samples have filtered genotypes. See the usage example in the tool doc (toward the end of the list). Now tell me you're not happy that we included it in the patch. C'mon, I dare you.
Again with a not-a-bug-fix thing, sorry -- there's a new read filter to get rid of reads that have too much soft-clipping going on to be honest. This is a sign that there's something seriously wrong with them, as we recently found out while analyzing a bunch of cheek swab samples that turned out to be heavily contaminated with bacterial DNA. It's always fun to see some seriously messed-up samples go by... and it's nice that now we can do something to rescue them.
We made some minor changes to documentation (VQSLOD, QD) and error handling (clarified error message for Contigs Out of Order error, made GVCF indexing parameters not required for gzipped output).