Supplementary Info

All supplementary information is linked at:
The download page contains:

S1. Sequences and Alignments

  • a. The raw contigs and scaffolds for each species.
  • b. The fasta sequences of unambiguous ORFs and flanking intergenic regions for each species.
  • c. Multiple alignment of unambiguous ORFs and intergenic regions.
  • d. Protein alignments of unambiguous ORFs are linked at the SGD website.

S2. Annotation

  • a. All predicted ORFs for each species and their correspondence with S.cerevisiae.
  • b. Blocks of conserved gene order (synteny) between each species and S. cerevisiae.
  • c. Small and large homology groups for each species.
  • d. Other intergenic features. tRNA table and counts per species. Transposons tables.
  • e. Multiple alignment of tRNA genes, other RNA genes, Centromeres, ARS, LTR.

S3. Visualization of gene correspondence

  • a. Dotplots between ORFs in each assembly and ORFs in S. cerevisiae chromosomes.
  • b. Tiling of local gene order and ORF correspondence in 50kb windows.
  • c. Interactive synteny viewer at SGD website.

S4. Mutation counts

  • a. Overall counts of transitions, transversions, insertions, deletions along phylogenetic tree.
  • b. Ka/Ks rate for uninterrupted S. cerevisiae ORFs.

S5. Rearrangements

  • a. Distances between consecutive ORFs in each species and S. cerevisiae.
  • b. Table of genomic rearrangements for each species (translocations, inversions, segment duplications).
  • c. Table of all insertions/deletions of at least 2kb between consecutive syntenic ORFs.
  • d. Clusters of ORFs with ambiguous correspondence define regions of rapid change.

S6. RFC test

  • a. Test outcome for every ORFs as kept, rejected or no_call.
  • b. RFC score for every ORF across each species and S. cerevisiae.
  • c. Window-based RFC analysis for each species and S. cerevisiae.
  • d. Correlation of RFC test with length of open frame for short ORFs of varying lengths.

S7. Revisiting S.cerevisiae annotation

  • a. Table of proposed changes in ORF boundaries (changed start, changed end, merged ORFs).
  • b. Alignment of proposed start/end boundary changes.
  • c. Alignment of apparent single-species frame-shift mutations
  • d. Results of resequencing proposed frame-shifts in S. cerevisiae.
  • e. Alignment of resequenced regions and PCR primers used.
  • f. Table of proposed novel introns and proposed changed introns.
  • g. Alignment of known, changed, and proposed novel introns

S8. Genome-wide motifs

  • a. Table of mini-motifs with CC1, CC2, CC3 score, extension and conservation counts
  • b. Sequence-based motif collapsing for each test.
  • c. Co-occurrence-based collapsing of grouped consensi.

S9. Category-based motif discovery

  • a. Increased enrichment of known motifs by using multiple genomes.
  • b. Comparison of our category-based motif discovery and MEME for known motifs.
  • c. All category-based motifs discovered, clustered by sequence similarity.