Broad: Annotation: Argo: Internal: Calhoun FileBase

What is the Calhoun FileBase?

The Calhoun Filebase is a standard, central repository for feature files, accessible by internal Broad Institute users. It's like a database, except easier to use, because all you have to do to share your data is drop a file in the appropriate sequence directory. Argo knows to look in this repository, and will automatically register files under the appropriate sequence. Files can be in either CalhounXML or GFF3 format (recommended, easier). FileBase GFF3 format files must have the sequence id they are to be imposed against in column one prefixe by 'seq.' Besides this, only the standard GFF3 formatting rules apply.

A sample gff file in this format can be found in the filebase at this location:

prod/data/features/HAWK2_TEST/7000000007104/7000000007104825/sample.gff3

To view it in argo, open the DB menu -> Sequence Tree to "Test Sequences" then 'Hawk2 Masked Contig 1.3'

If you have trouble accessing the filebase and wish to examine this file, you may also download it here.

Note that the filebase is for sharing data. You can look at your own local data files just by imposing them onto an arbitrary sequence using the Argo File -> Load Track File menu item.

Viewing FileBase Files in Argo

Windows

To view files in the filebase using argo, Windows users do not have to do anything special. Just open the File-> Track Table for your sequence and tick any "Calhoun XML" files you wish to load.

You can browse the filebase directory structure under windows by typing using the Start Menu -> Run menu item, and typing \\chlorine\annotation\prod\data\features

Mac OS X

Mac users will have to download this tarball to mount the filebase onto their computers. Then tar -xvzf it (sorry about the tarball step, but slinging raw .app files around results in weird os x meta data corruption issues). The executable is called mount_annotation.app. You can either run the script by double clicking on it, or drag and drop it to your startup items, so it's automatically run when you log on. To do this, open Apple Menu -> System Preferences -> Accounts, select your user name, and then tick that "Startup Items" tab. Then just drag and drop the mount_annotation script you downloaded there. If you have trouble doing this due to insufficient account privileges or for any other reason, please don't hesitate to contact Reinhard (info below).

Mount Annotation Startup Item

What the script does is mount the "Annotation" share on the host "Chlorine" to your /Volumes/Annotation directory. If you wish, you can do this manually instead using "apple k" to "connect to server" "smb://Chlorine" and then select the share "Annotation".

You can browse the filebase directory structure under /Volumes/Annotation using the Finder or terminal. The production filebase root is /Volumes/Annotation/prod/data/features

Understanding the FileBase Directory Structure

The file base root contains subdirectorys based on sequence group (sequence group can be arbitrary usually means organism or assembly). Each sequence group directory contains further subdirectorys based on the first 13 digits of the sequence id. Each of these subdirectories contains further subdirectories based on the entire sequence id. This last layer of subdirectories contains the track files to be associated with the sequence. The reason for the logically unnecessary 13 digit layer is that sequence groups can have many thousands of sequences and folders with that many subdirectories can cause technical problems.

Loading your own Data into the FileBase

Just browse the FileBase as described in the OS sections above, and drop your valid gff3 or calhounXML file in the appropriate sequence directory.

Contact: Reinhard Engels
argo-support@broad.mit.edu
617-452-2650
320 Charles, Room 2164

Valid XHTML 1.0!