Supplementary information
- Sequencing and Comparison of Yeasts to Identify Genes and Regulatory
Elements
Manolis Kellis, Nick Patterson, Matthew Endrizzi, Bruce Birren, Eric Lander
S1. Sequences and Alignments
| S1a | Assembly | S. paradoxus | Spar_contigs | (fasta, 12 Mb) | Spar_scaffolds | (txt, 23 kb) |
| S. mikatae | Smik_contigs | (fasta, 12 Mb) | Smik_scaffolds | (txt, 23 kb) | ||
| S. bayanus | Sbay_contigs | (fasta, 12 Mb) | Sbay_scaffolds | (txt, 23 kb) | ||
| S1b | ORFs | Listing | Table of ORFs with unamibuguous correspondence | |||
| S. cerevisiae | Scer_extended | (fasta.gz, 4.0 Mb) | Fasta file containing each ORF (uppercase) and flaking intergenic regions (lowercase) |
|||
| S. paradoxus | Spar_extended | (fasta.gz, 4.0 Mb) | ||||
| S. mikatae | Smik_extended | (fasta.gz, 3.6 Mb) | ||||
| S. bayanus | Sbay_extended | (fasta.gz, 3.6 Mb) | ||||
| S1c | Alignments | Nucleotide | all_alignments | (tar.gz, 23 Mb) | Aligned ORFs and flanking sequences | |
| Browse | by chromosome | |||||
| Protein | All protein alignments available at the SGD site | |||||
S2. Annotation
| S2a | ORF annotation | Forward | S.paradoxus | S.mikatae | S.bayanus | The best match of every predicted ORF | |
| Forward full | S.paradoxus | S.mikatae | S.bayanus | All matches for every predicted ORF | |||
| Reverse | S.paradoxus | S.mikatae | S.bayanus | The best match of every S. cerevisiae ORF | |||
| Reverse full | S.paradoxus | S.mikatae | S.bayanus | All matches for every S. cerevisiae ORF | |||
| S2b | Synteny | Synteny blocks | S.paradoxus | S.mikatae | S.bayanus | Blocks of conserved gene order (synteny) | |
| S2c | Correspondence | Homology Groups | S.paradoxus | S.mikatae | S.bayanus | Genes of ambiguous correspondence (large families) | |
| Homology Groups | S.paradoxus | S.mikatae | S.bayanus | Genes of ambiguous correspondence (small families) | |||
| S2d | Features | tRNA | S.paradoxus | S.mikatae | S.bayanus | Predicted tRNA genes (and usage) | |
| Transposons | S.paradoxus | S.mikatae | S.bayanus | Transposable elements (best match) | |||
| Transposons full | S.paradoxus | S.mikatae | S.bayanus | Transposable elements (all matches) | |||
| S2e | Alignments | Aligned features | tRNA/RNA | CEN | ARS | LTR | Orthologous alignments of S. cerevisiae features |
S3. Visualization of gene correspondence
| S3a | Dotplots | S.paradoxus | S.mikatae | S.bayanus | Dotplots of gene correspondence |
| S3b | Tiling | Example | Figure 1 showing S. cerevisiae Chromosome VII | ||
| Visualization | 250 files tiling the S.cerevisiae genome (jpg/ps/matlab) | ||||
| S3c | Interactive | Synteny viewer | Interactive synteny viewer can be found at the SGD site | ||
S4. Mutation Counts
| S4a | MutationCounts | (xls) | 68 kb | Overall counts of transitions, transversions, insertions, deletions along phylogenetic tree |
| S4b | KaKs_average | (xls) | 435 kb | Ka/Ks rate for uninterrupted S. cerevisiae ORFs |
| S4b | KaKs_details | (xls) | 834 kb | Ka/Ks rate for uninterrupted S. cerevisiae ORFs (all pairwise comparisons) |
S5. Rearrangements
| S5a | Sparadoxus_walking | (txt) | 444 kb | Distances between consecutive ORFS in each species and S. cerevisiae |
| Smikatae_walking | (txt) | 457 kb | ||
| Sbayanus_walking | (txt) | 446 kb | ||
| S5b | Rearrangements | (xls) | 24 kb | Table of genomic rearrangements for each species (translocations, inversions, segmental duplications) |
| Rearrangements_figure | (xls) | 24 kb | ||
| S5c | All_insertions | (xls) | 58 kb | Table of all insertions/deletions of al least 2kb between consecutive syntenic ORFs |
| Ambiguity_clusters | (xls) | 23 kb | Clusters of ORFs with ambiguous correspondence define regions of rapid change |
S6. RFC test
| S6a | orf_decisions | (txt) | 365 kb | Test outcome for every ORF as kept, rejected or no_call |
| S6b | RFC | (xls) | 1,445 kb | RFC score for every ORF across each species and S. cerevisiae |
| RFC_Intergenic | (xls) | 63 kb | ||
| S6c | window_based_named | (txt) | 2,900 kb | Window-based RFC analysis for each species and S. cerevisiae |
| window_based_unnamed | (txt) | 1,007 kb | ||
| window_based_suspicious8 | (txt) | 540 kb | ||
| S6d | SmallORFs | (doc) | 40 kb | Correlation of RFC test with length of open frame for short ORFs of varying lengths |
| SmallORFs3 | (doc) | 40 kb |
S7. Revisiting S. cerevisiae annotation
| S7a | Change_Start | (xls) | 434 kb | Table of proposed start codon changes |
| Change_Stop | (xls) | 434 kb | Table of proposed stop codon changes in S. cerevisiae | |
| Merged_ORFs | (xls) | 434 kb | Table of proposed merges between consecutive S. cerevisiae ORFs | |
| S7b | Change_Start_End_alignments | (txt) | 1,362 kb | Alignment of proposed start/end boundary changes |
| S7c | Frameshifts | (txt) | 1,014 kb | Alignment of apparent single-species frame-shift mutations |
| S7d | Resequencing | (xls) | 20 kb | Results of resequencing proposed frame-shifts in S. cerevisiae |
| S7e | PCR_resequenced_regions | (txt) | 412 kb | Alignment of resequenced regions |
| SEQ1000 | (fasta) | 48 kb | Primers used | |
| S7f | Intron_table | (xls) | 30 kb | Table of proposed novel introns and proposed changed introns |
| S7g | introns_known | (txt) | 1,771 kb | Alignments of known, changed and proposed novel introns |
| introns_known_bad | (txt) | 276 kb | ||
| introns_known_v3 | (txt) | 2,967 kb | ||
| novel_noSGD_noARES | (txt) | 4,110 kb | ||
| novel_noSGD_yesARES | (txt) | 290 kb |
S8. Genome-wide motifs
| S8a | TopMinis | (txt) | 164 kb | Table of mini-motifs with CC1, CC2, CC3 score, extension and conservation counts |
| AllMinis | (txt) | 335 kb | ||
| S8b | CC1_collapsing | (txt) | 179 kb | Sequence-based motif collapsing for each test |
| CC2_collapsing | (txt) | 137 kb | ||
| CC3_collapsing | (txt) | 167 kb | ||
| S8c | MegaRecollapsing | (xls) | 313 kb | Co-occurence-based collapsing of grouped consensi |
| S8d | GenesHit | (txt) | 1.5 Mb | Genes containing each genome-wide motif (upstream) |
| GenesHit_down | (txt) | 1.5 Mb | Genes containing each genome-wide motif (downstream) |
S9. Category-based motif discovery
| S9a | Increased_Enrichment | (xls) | 25 kb | Increased enrichment of known motifs by using multiple genomes |
| S9b | Comparison_with_MEME | (xls) | 24 kb | Comparison of our category-based motif discovery and MEME for known motifs |
| S9c | Category_Based_Novel | (xls) | 48 kb | All category-based motifs discovered, clustered by sequence similarity |
