Tribble
Posted in Developer Zone | Last updated on 2014-01-15 14:06:23


Comments (4)

1. Overview

The Tribble project was started as an effort to overhaul our reference-ordered data system; we had many different formats that were shoehorned into a common framework that didn't really work as intended. What we wanted was a common framework that allowed for searching of reference ordered data, regardless of the underlying type. Jim Robinson had developed indexing schemes for text-based files, which was incorporated into the Tribble library.

2. Architecture Overview

Tribble provides a lightweight interface and API for querying features and creating indexes from feature files, while allowing iteration over know feature files that we're unable to create indexes for. The main entry point for external users is the BasicFeatureReader class. It takes in a codec, an index file, and a file containing the features to be processed. With an instance of a BasicFeatureReader, you can query for features that span a specific location, or get an iterator over all the records in the file.

3. Developer Overview

For developers, there are two important classes to implement: the FeatureCodec, which decodes lines of text and produces features, and the feature class, which is your underlying record type.

For developers there are two classes that are important:

  • Feature

    This is the genomicly oriented feature that represents the underlying data in the input file. For instance in the VCF format, this is the variant call including quality information, the reference base, and the alternate base. The required information to implement a feature is the chromosome name, the start position (one based), and the stop position. The start and stop position represent a closed, one-based interval. I.e. the first base in chromosome one would be chr1:1-1.

  • FeatureCodec

    This class takes in a line of text (from an input source, whether it's a file, compressed file, or a http link), and produces the above feature.

To implement your new format into Tribble, you need to implement the two above classes (in an appropriately named subfolder in the Tribble check-out). The Feature object should know nothing about the file representation; it should represent the data as an in-memory object. The interface for a feature looks like:

public interface Feature {

    /**
     * Return the features reference sequence name, e.g chromosome or contig
     */
    public String getChr();

    /**
     * Return the start position in 1-based coordinates (first base is 1)
     */
    public int getStart();

    /**
     * Return the end position following 1-based fully closed conventions.  The length of a feature is
     * end - start + 1;
     */
    public int getEnd();
}

And the interface for FeatureCodec:

/**
 * the base interface for classes that read in features.
 * @param <T> The feature type this codec reads
 */
public interface FeatureCodec<T extends Feature> {
    /**
     * Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.
     *
     * @param line the input line to decode
     * @return  Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is
     * a comment)
     */
    public Feature decodeLoc(String line);

    /**
     * Decode a line as a Feature.
     *
     * @param line the input line to decode
     * @return  Return the Feature encoded by the line,  or null if the line does not represent a feature (e.g. is
     * a comment)
     */
    public T decode(String line);

    /**
     * This function returns the object the codec generates.  This is allowed to be Feature in the case where
     * conditionally different types are generated.  Be as specific as you can though.
     *
     * This function is used by reflections based tools, so we can know the underlying type
     *
     * @return the feature type this codec generates.
     */
    public Class<T> getFeatureType();


    /**  Read and return the header, or null if there is no header.
     *
     * @return header object
     */
    public Object readHeader(LineReader reader);
}

4. Supported Formats

The following formats are supported in Tribble:

  • VCF Format
  • DbSNP Format
  • BED Format
  • GATK Interval Format

5. Updating the Tribble library

Updating the revision of Tribble on the system is a relatively straightforward task if the following steps are taken.

  • Make sure that you've checked your changes into Tribble; unversioned changes will be problematic, so you should always check in so that you have a unique version number to identify your release.

  • Once you've checked-in Tribble, make sure to svn update, and then run svnversion. This will give you a version number which you can use to name your release. Let's say it was 82. **If it contains an M (i.e. 82M) this means your version isn't clean (you have modifications that are not checked in), don't proceed`.

  • from the Tribble main directory, run ant clean, then ant (make sure it runs successfully), and ant test (also make sure it completes successfully).

  • copy dist/tribble-0.1.jar (or whatever the internal Tribble version currently is) to your checkout of the GATK, as the file ./settings/repository/org.broad/tribble-<svnversion>.jar.

  • Copy the current XML file to the new name, i.e. from the base GATK trunk directory:

    cp ./settings/repository/org.broad/tribble-.xml ./settings/repository/org.broad/tribble-.xml

  • Edit the ./settings/repository/org.broad/tribble-<svnversion>.xml with the new correct version number and release date (here we rev 81 to 82).

    This involves changing:

    <ivy-module version="1.0">
        <info organisation="org.broad" module="tribble" revision="81" status="integration" publication="20100526124200" />
    </ivy-module>
    

    To:

    <ivy-module version="1.0">
        <info organisation="org.broad" module="tribble" revision="82" status="integration" publication="20100528123456" />
    </ivy-module>
    

    Notice the change to the revision number and the publication date.

Notice the change to the revision number and the publication date.

  • Remove the old files git remove ./settings/repository/org.broad/tribble-<current_svnversion>.*

  • Add the new files git add ./settings/repository/org.broad/tribble-<new_svnversion>.*

  • Make sure you're using the new libraries to build: remove your ant cache: rm -r ~/.ant/cache.

  • Run an ant clean, and then make sure to test the build with ant integrationtest and ant test.

  • Any check-in from the base SVN directory will now rev the Tribble version.


Return to top Comment on this article in the forum