Argo File Formats: GFF3

GFF3 is the most recent flavor of GFF (General Feature Format), a simple tab delimited format for describing genomic features. GFF3 allows multi-level grouping and multi-level descriptive attributes. If you are unfamiliar with GFF and its flavors it is important that you read this GFF overview to decide which flavor is best suited to your needs. The current document describes only the GFF3 flavor.

GFF3 is both more powerful and more restrictive than other GFF formats. Argo only supports a subset of GFF3 features.

GFF3 files are directly editable in Argo.

File Extensions

Give your GFF3 files the extension '.gff3' instead of '.gff' so that Argo will interpret them correctly. If you do not use 'gff3' Argo will prompt you to choose a flavor. This is annoying and error prone.

GFF Records

A GFF file consists of one or more records, each of which represents a simple start to stop feature. Records are separated by newlines, one record per row. Each record has 9 fields, the last of which is optional. This last optional field is the only field that differs among the different GFF flavors, but the difference is significant. In GTF3, this final field is used for descriptive attributes AND a special attributes that are used for identification and grouping of multiple records into a single composite record.

A record in GFF3 may represent many things: an exon, a transcript, a gene, etc. Arbitrary levels of hierarchy may be defined by using the 'ID' and 'Parent' attributes, described below. This is the big advantage of GFF3 over the other flavors of GFF: their rows represent features on only a single (usually exon) level.

GFF files contain features but no sequence. To view them in Argo, you will have to load sequence data first (for example, a fasta file) and superimpose the gff files onto the sequence.

Examples

Here are some sample GFF3 records (please note these are formatted with spaces for better viewing, not tab delimited as they must be for parsing by argo):

human15.1 . gene            214301  215772 . +   . ID=HsG8283
human15.1 . mRNA            214360  215771 . +   . Comments=fixed+one+splice+junction;Parent=HsG8283;Evidence=7000000069743825;Transcript_type=Novel_Transcript;Name=Novel+Transcript%2C+variant+%28partial%29;ID=HsT20206
human15.1 . CDS             214360  214441 . +   . Parent=HsT20206
human15.1 . CDS             215299  215444 . +   . Parent=HsT20206
human15.1 . CDS             215641  215766 . +   . Parent=HsT20206
human15.1 . three_prime_UT  215767  215771 . +   . Parent=HsT20206
human15.1 . mRNA            214590  215772 . +   . Comments=fixed+one+splice+site%0A;Parent=HsG8283;Evidence=7000000069600840;Transcript_type=Novel_Transcript;Name=Novel+Transcript%2C+variant+%28partial%29;ID=HsT20207
human15.1 . five_prime_UTR  214590  214590 . +   . Parent=HsT20207
human15.1 . CDS             214591  214660 . +   . Parent=HsT20207
human15.1 . CDS             215299  215444 . +   . Parent=HsT20207
human15.1 . CDS             215641  215769 . +   . Parent=HsT20207
human15.1 . three_prime_UT  215770  215772 . +   . Parent=HsT20207
human15.1 . mRNA            214301  215769 . +   . Parent=HsG8283;Evidence=7000000069974357;Transcript_type=Candidates+for+Deletion;Name=Novel+Transcript+%28partial%29;ID=HsT16028
human15.1 . five_prime_UTR  214301  214302 . +   . Parent=HsT16028
human15.1 . CDS             214303  214460 . +   . Parent=HsT16028
human15.1 . CDS             215299  215467 . +   . Parent=HsT16028
human15.1 . three_prime_UT  215468  215769 . +   . Parent=HsT16028
human15.1 . mRNA            215218  215772 . +   . Parent=HsG8283;Evidence=7000000069512231;Transcript_type=Novel_Transcript;Name=Novel+Transcript%2C+variant;ID=HsT16029
human15.1 . five_prime_UTR  215218  215233 . +   . Parent=HsT16029
human15.1 . CDS             215234  215444 . +   . Parent=HsT16029
human15.1 . CDS             215641  215735 . +   . Parent=HsT16029
human15.1 . three_prime_UT  215736  215772 . +   . Parent=HsT16029

Note that tabs have been replaced with spaces here for easier viewing.

Here are a sample fasta file and sample gff3 file that you can download and open in argo.

Field Descriptions

Note: up util the last field (field 9) all gff flavors are the same.

  1. seqname - The name of the sequence. Typically a chromosome or a contig. Argo does not care what you put here. It will superimpose gff features on any sequence you like.
  2. source - The program that generated this feature. Argo displays the value of this field in the inspector but does not do anything special with it.
  3. feature - The name of this type of feature. The official GFF3 spec states that this should be a term from the SOFA ontology, but Argo does not do anything with this value except display it.
  4. start - The starting position of the feature in the sequence. The first base is numbered 1.
  5. end - The ending position of the feature (inclusive).
  6. score - A score between 0 and 1000. If there is no score value, enter ".".
  7. strand - Valid entries include '+', '-', or '.' (for don't know/don't care).
  8. frame - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'. Argo does not do anything with this field except display its value.
  9. GFF3: grouping attributes Attribute keys and values are separated by '=' signs. Values must be URI encoded.quoted. Attribute pairs are separated by semicolons. Certain, special attributes are used for grouping and identification (See below). This field is the one important difference between GFF flavors.

Special Field 9 Attributes

The first special thing about field 9 attributes is that they now can be associated with transcripts. Previous flavors of GFF restricted attributes to the lowest level subfeature (exons).

Any key=value attribute pair will be displayed by argo, but the following have special meaning:

The GFF3 spec specifies other "reserved" attribute keys besides the above, but Argo does not do anything special with them.

Unsupported GFF3 Features

Argo does NOT suppport the following GFF3 features. Files with such records may possibly load, but behavior will probably not be as expected.

For More Information

See also the GFF3 spec.


Last Updated: Sept 18 2006
Contact: Reinhard Engels
argo-support@broad.mit.edu