Online supplemental information for Lin, Carlson, Crosby et al. (2007)
Revisiting the protein-coding gene catalog of Drosophila melanogaster using twelve fly genomes
Address correspondence to: Manolis Kellis (manoli at mit.edu)
Unless otherwise noted, all coordinates and identifiers in the following data files refer to the BDGP Release 4 genomic assembly of D. melanogaster and FlyBase annotation release 4.3. Coordinates in heterochromatic sequence assemblies (e.g. chr2h, chrXh) refer to DHGP release 3.2b.
Prediction, curation and validation of new exons
- 1,193 new exon predictions: gff fasta
- Manual curation and cDNA sequencing records for each prediction
- Recovered full-length cDNA sequences
- (Prediction, cDNA clone, GenBank accession no.) association table
Evaluation and classification of existing annotations
- Scores and classification of all genes
- Scores and classification of non-coding regions (random controls)
- "Rejected" genes
Note: Two of 13,733 genes are missing from the above files. They are FBgn0066084=CG30425=RpL41 and FBgn0013745=CG31056=Acp98AB. They have extremely short ORFs (25aa and 28aa, respectively).
Proposed refinements to existing annotations
Unusual gene structures
- Candidate readthrough genes: CGids FBgns
- Candidate polycistronic genes: new known/currently alt. tx
- Candidate translational frameshifts
