Mesoplasma florum Gene Calling
Outline
Gene Annotation
The Mesoplasma florum genome sequence was annotated manually using InforMax Vector NTI Advance. Open reading frames (ORFs) were identified using ORF Finder, available through NCBI (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and verified with InforMax Vector NTI Advance. The set of identified protein coding sequences (CDSs) consists of all ORFs with a translated length of at least 37 amino acids (equivalent to rpmJ, the shortest identified gene in M. florum. In the case of overlapping ORFs, the longer ORF was taken, unless the shorter sequence demonstrably coded for a known protein product. Such cases were extremely rare. Futhermore, it was presumed that no CDSs overlapped regions coding for stable RNA species. When possible, CDSs were further determined by identification of plausible ribosomal binding sites. Gene identity was determined using the PSIBLAST program (http://www.ncbi.nlm.nih.gov/BLAST/) (Altschul et al. 1997). For genes of undetermined function, a motif search of the PROSITE, BLOCKS, ProDom, PRINTS, and Pfam databases (Bucher and Bairoch 1994, Henikoåt al. 1994, Servant et al. 2002, Attwood 2002, Bateman 2002) was used through the Kyoto University Bioinformatics Center GenomeNet MOTIF tool (http://motif.genome.ad.jp/). Lipoproteins were identified using software available at DOLOP (http://www.mrc-lmb.cam.ac.uk/genomes/dolop/analysis.htm) (Babu and Sankaran, 2002). Clusters of Orthologous Group (COG) designation was determined for each CDS using the COGnitor tool (http://www.ncbi.nlm.nih.gov/COG/xognitor.html) (Tatusov et al. 1997). An analysis of metabolic pathways and required subsystems led to the assumption that specific genes or RNA fragments were present. In several cases, these predicted genes were identified, despite quite weak similarity to other known examples. tRNAs were identified with the tRNAscan-SE program (Lowe and Eddy 1997). Other stable RNAs were identified by specific search for SRP, RNase-P, and tmRNA species, using other Mollicute sequences as search motifs, when available, and specialized tools (Regalia et al. 2002, Samuelsson and Guindy 1990, Simoneau and Hu 1993, Ushida et al. 1994, Ushida et al. 1996, Williams 2002).
Gene Locus Numbers
Every annotated gene is assigned a unique locus number of the form MFl###.
The genes were numbered sequentially across the complete genome sequence.
