Methanosarcina Project Information
Methanogenesis, the biological production of methane, plays a pivotal role in the global carbon cycle and contributes significantly to global warming. Each year, an estimated 900 million metric tons of methane are biologically produced, the majority of which is derived from acetate. We present here the first fully-annotated genome sequence of an acetate-utilizing methanogen, Methanosarcina acetivorans C2A.
The Methanosarcineae are metabolically and physiologically the most versatile methanogens. Only Methanosarcina species possess all three known pathways for methanogenesis and are capable of utilizing no less than nine methanogenic substrates, including acetate. In contrast, all other orders of methanogens possess a single pathway for methanogenesis, and many utilize no more than two substrates.
Among methanogens, the Methanosarcineae display extensive environmental diversity. Individual species of Methanosarcina have been found in freshwater and marine sediments, decaying leaves and garden soils, oil wells, sewage and animal waste digestors and lagoons, thermophilic digestors, faeces of herbivorous animals, and the rumens of ungulates.
The Methanosarcineae are unique among the Archaea in forming complex multicellular structures during different phases of growth and in response to environmental change. Within the Methanosarcineae, a number of distinct morphological forms have been characterized including single cells with and without a cell envelope, as well as multicellular packets (Figure left) and lamina.
This metabolic and physiological versatility is reflected in genome of Methanosarcina acetivorans. At 5.71Mb it is by far the largest known archael genome and larger than many sequenced bacteria. An analysis of the 4,524 open reading frames reveals a strikingly wide and unanticipated variety of metabolic and cellular capabilities. The results of these analyses are presented in Galagan et al (2002), Genome Research 12(4):532-542.
Available Genome DataWe are happy to freely provide the entire Methanosarcina acetivorans fully-annotated genome:
- DNA sequence (5.71Mb), available for download and Blast
- Protein sequence (4526 genes), available for download and Blast
- Graphical views of the annotated sequence, via
- FeatureMap: lightweight graphical viewer showing sequence, genes, protein families, and homologous proteins
- GenomeBrowser: more sophisticated, Java-based browser for viewing larger regions (requires Java plug-in, and not available on Macintosh)
- Annotated Features, available for interactive search:
- Find genes (based on name, locus, symbol abbreviation, genbank number, protein family domain, cellular functional category)
- Find Open Reading Frames predicted by the microbial gene-finding tool GLIMMER
- Find BLASTX alignments to protein sequences (NR)
- Find BLASTN alignments to DNA sequences (NT)
- Find protein domains, based on identifying PFAM and TIGRFAM domains using the HMMER program
- Find TRNA sequences, based on results of the tRNAscan program
- View genes categorized by cell function, KEGG Category, EC Number, Multigene Family, or PFAM domain.
Dr. William Metcalf from the University of Illinois. Principal investigators from the Broad Institute include James Galagan, Bruce Birren and Chad Nusbaum.
The Methanosarcina acetivorans strain C2A was grown in single cell morphology47 at 35°C in HS broth medium containing 125 mM methanol plus 40 mM sodium acetate (HS-MA medium)48.
Genomic DNA was isolated from M. acetivorans and was used to construct m13 (1.5kb inserts), plasmid (4kb inserts), and fosmid (40kb inserts) libraries. Plasmid and Fosmid inserts were sequenced from both ends to generate paired-reads. We generated sequence coverage of 7X from plasmids, 1X from M13 and 0.076X from Fosmids and assembled it with Phrap.Initial analysis of the assembly was done with the Mapper software (M.C. Zody, personal communication) to select gap-spanning clones for finishing. 200 gaps spanned by plasmid clones were closed by transposon-based sequencing using the EZ::TN <KAN-2> (tm) from Epicentre. 48 gaps spanned only by Fosmids were closed by sequencing Fosmid-derived PCR products. Sequence from 28 unspanned gaps was obtained from fragments generated by combinatorial PCR using genomic DNA as template and pooled primers50. One unspanned gap was closed by sequencing a small-insert library51 produced from an 8.5 kb PCR product . Regions of low sequence quality were resolved by:
- use of ABI dGTP Big Dye Terminator sequencing mix
- transposon-primed sequencing of plasmid clones, or
- sequencing PCR products obtained from plasmid or genomic template
- Open reading frames (ORFs) likely to encode proteins were predicted using GLIMMER2.
- All ORFs were searched against two sets of protein family Hidden Markov Models (HMM), Pfam and TIGRFAM, using the HMMER program.
- The entire genome was searched against the public protein databases using BLASTX with threshold E < 1e-5, and againt the public nucleotide databases using BLASTN with the threshold E < 1e-9
- Transfer RNAs were identified using the tRNAScan-SE program.
- ORFs longer than 200bp and all ORFS with similarity to a protein family HMM or known proteins were annotated as genes
- All ORFs were inspected for alternative start positions
- ORFs with no similarity to other sequence were named predicted proteins
- ORFs with similarity to sequences with unknown function were named conserved hypothetical proteins
- For ORFs with similarity to sequences of known function, we:
- Inspected all corresponding BLAST alignments in order to track biological evidence supporting function
- Consulted literature to identify those proteins experimentally characterized
- Reviewed correspondence to protein families
- Determined standard Enzyme Commission designation, if possible
- Named gene in accordance with Enzyme Commission designation
- Categorized gene by cellular function designation
- Marked unusual genes for further review
Multigene families were constructed by searching each annotated gene against all other genes using BLASTP, requiring matches with E < 1e-5 over 60% of the longer gene length, and subsequently clustering genes with matches.
The genome was renumbered with the start at the putative origin of replication, which was identified as the point of maximum cumulative AT skew (defined as the cumulative sum of A-T/A+T on one strand).
A panel of over two dozen experts was assembled to analyze the genome as part of a Community Annotation Project (CAP). The scientists in this community project could view and submit genome annotations using this website, and they drew together expert analyses of biological pathways. The project culminated in a two day Genome Analysis Meeting at the Broad Institute. CAP participants included:
Everly Conway de Macario
William B Whitman
See Community Annotation Project for more details.
There are exciting times ahead for the Methanosarcina community as four different species are being fully sequenced and microarray projects are underway. Stay tuned for more exciting results...
Questions about the project should be directed to firstname.lastname@example.org.