Tagged with #variantcontext
1 documentation article | 0 announcements | 1 forum discussion


Comments (1)

The picard-public repository on sourceforge, in addition to housing the sam-jdk and picard-public, is now home to tribble and the org.broadinstitute.variant package (which includes VariantContext and associated classes as well as the VCF/BCF codecs).

If you just need to check out the sources and don't need to make any commits into the picard repository, the command is:

svn checkout svn://svn.code.sf.net/p/picard/code/trunk picard-public

Then you can attach the picard-public/src/java directory in IntelliJ as a source directory (File -> Project Structure -> Libraries -> Click the plus sign -> "Attach Files or Directories" in the latest IntelliJ).

To build picard, sam-jdk, tribble, and org.broadinstitute.variant all at once, type ant from within the picard-public directory. To run tests, type ant test

If you do need to make commits into the picard repository, first you'll need to create a sourceforge account, then contact the Picard team and request access for that account. Once you've been given access, you'll need to check out their repository using a command of this form:

svn checkout --username=YOUR_USERNAME svn+ssh://YOUR_USERNAME@svn.code.sf.net/p/picard/code/trunk picard-public
No posts found with the requested search criteria.
Comments (11)

Hello all,

First post. Thank you for these amazing tools. I have spent two days pulling my hair out, trying all enumerations, searching the documentation and forums, and in the end I come to you for help. Please forgive me if these topics have been covered elsewhere.

I have several VCFs generated by SomaticSniper that I'd like to filter based on the SomaticScore (SSC in the FORMAT field). I was working with VariantFiltration and SelectVariants, and trying to use different options, as well as regular expressions, to select those calls with a SSC over 40. I have been unable to do so. I also looked into trying to figure out JEXL, and using the last command listed on the documentation page, about using the VariantContext feature to drill into an array. I cannot get it to recognize the SSC column of the FORMAT field and then filter for the TUMOR sample.

Using VariantFiltration I was using -select (but I understand now that this searches the INFO field only). I was then using the --genotypeFilterExpression, but it would not add the FT tag to the FORMAT field as it said it would, it would just apply PASS to everything.

java -Xmx4-jar GenomeAnalysisTK.jar -T VariantFiltration -R ~/Documents/reference/human_g1k_v37.fasta -V '/home/registry3/Desktop/merged/104024sniperRAWSNPS.vcf' --G_filter "SSC < 40.0" --G_filterName "myFilter" -o '/home/registry3/Desktop/merged/104024sniperFILTEREDSNPS.vcf'

Using SelectVariants, I was using -sn to select the TUMOR sample and then using -select_expressions, but I guess this also only works on the INFO field. I had been trying to use --sample_expression which gives the ability to use a regular expression, but then I never had good results; it wouldn't do any filtering, and output the entire input file. Does the regular expression only apply to the sample name, and not the content of each line? Trying to select SSC over 40 from a line like this

#CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  TUMOR
1   10177   0   A   C   0   0   AC=1;AF=0.500;AN=2;DP=62    GT:AMQ:BCOUNT:BQ:DP:DP4:GQ:IGT:MQ:SS:SSC:VAQ    0/1:16,15:40,22,0,0:28,25:62:31,9,10,12:37:0/1:16:2:19:37

I used a line such as this, to look at the second to last number in the FORMAT field based on : dividers

java -Xmx4-jar GenomeAnalysisTK.jar -R ~/Documents/reference/human_g1k_v37.fasta -T SelectVariants --variant '/home/registry3/Desktop/merged/104024sniperSNPs.vcf' -o '/home/registry3/Desktop/merged/104024sniperSSC40.vcf' -se ".*:[4,5,6,7,8,9][1,2,3,4,5,6,7,8,9]:[1234567890]{2,3}$"

I am not a coder, as you can probably see, but I'm trying to get this worked out. This output the entire file still, with SSC values above and below 40.

Looking into use the vc.getGenotype array access, I could not find much documentation about VariantContext; I was looking through the files on github, looking through the code and looking for samples, since the .getAD() from the documentation seems to work, but alas, there is no .getSCC() available. Is using vc. the best way to drill into an array (the FORMAT field) and search for what I want?

I didn't post all the code and output, trying to keep this as short as possible. I can post pastebin outputs if that would be helpful. Thank you, David