Aspergillus nidulans FAQ

Questions

Answers

  • Sequencing
    1. What is whole-genome shotgun sequencing?
      Whole genome shotgun sequencing is a technique for determining the DNA sequence of a genome by randomly shearing the DNA, sequencing multiple overlapping fragments, and inferring the original sequence from fragments that overlap. This method has been successfully used for many bacterial and fungal genome projects. See Assembly for details.
    2. What is an assembly?
      An assembly is a representation of the computationally derived relative positions of a set of sequenced fragments. When these individual sequences overlap, a consensus sequence is derived representing the most likely base at each position in the assembly. In this way, increased sequence redundancy improves the quality of the assembly and the confidence in the consensus. See Assembly for details.
    3. What does the name "Contig 1.XXX" mean?
      A contig is a sequence fragment created by assembling whole-genome shotgun reads. See Assembly for details.

      Every assembly contains multiple contigs. Each assembly is numbered sequencially. The number preceding the decimal point indicates the assembly number. Contigs within an assembly are also numbered sequencially. Thus "Contig 1.177" indicates contig #177 within assembly 1.

    4. What is a sequence contig?
      A sequence contig is the extended contiguous sequence that is produced by the assembly process that joins overlapping sequences. See Assembly for details.
    5. Are the contigs ordered?
      Contigs within the same supercontig are ordered. See Assembly for details.
    6. What is a sequence supercontig?
      A supercontig consists of one or more sequence contigs known to occur in a specific order and orientation. Because we sequence each end of the subclones of plasmids, Fosmids, and BACs, we can recognize that when one end of a clone lies in one sequence contig and the other end of the clone lies in a different sequence contig, these two contigs probably lie close to each other. To create supercontigs we require that two or more such linking clones join two sequence contigs. See Assembly for details.
    7. Are the supercontigs ordered?
      No, the supercontigs are not ordered by number. However, we have aligned most of the genome against genetic maps which help position supercontigs on chromosomes. See Genetic Maps for details.
    8. How big is the Aspergillus nidulans genome?
      Our current total unique contig length is 30 million base pairs (bp). The genome size is approximately 31 Mb.
    9. What strain was sequenced?
      Aspergillus nidulans, strain FGSC A4.

    10. What is the current state of the assembly?
      The current assembly contains 89 supercontigs (scaffolds) from 284 sequence contigs >2 kb. See Assembly for detail.
    11. How complete is the current assembly?
      We estimate that the current release represents 96% of the Aspergillus nidulans genome and is covered to a depth of > 13X. It excludes very highly conserved repetitive sequence, and ribosomal RNA genes.
    12. Are the contigs ordered? For example, is contig 1.5 flanked by contigs 1.4 and 1.6?
      The contigs are numbered sequentially within larger supercontig fragments. Contigs within the same supercontig are positionally ordered. See Aspergillus nidulans Contig Numbering for details.

    13. How has the sequence been generated for the Aspergillus nidulans project?
      Our data consist of over 730,000 individual sequencing reads obtained by sequencing a combination of libraries including plasmid (4 kb & 10 kb inserts), fosmids (40 kb inserts), and BACs (110 kb inserts). 300,327 reads are contributed by Monsanto and 19,123 where provided by Dr Ralph Dean at North Carolina State University. See Assembly for details of the libraries used in this assembly.
    14. How will we know the assembly is correct?
      The quality of the assembly will be assessed in several ways. In addition to requiring that the paired plasmid and Fosmid ends occur in a logical manner, our assembly of the Aspergillus nidulans genome will be verified through: 1) integration of BAC end sequences, 2) comparison with available genomic sequences, and 3) correlation with the genetic map.
    15. What data are available?
      In this version of our data release, all sequence contigs over 2 kb are available. Smaller contigs are sparsely covered and often include poor quality or contaminated DNA. Sequence contig data can be accessed in several ways: either through a BLASTN or TBLASTN search with an option for contig subsequence retrieval, or through FTP download of the entire genome. Contig sequences are subject to change throughout this project, so each data release version number will be appended to the contig number as a prefix (e.g. 1.235 denotes assembly version 1, contig #235).
    16. What are the release goals for Aspergillus nidulans?
      Our goal is to release the results of our annotation and analysis of the assembly in spring 2003.
  • Downloading
    1. What format is the download file in?
      The genome data is pure text in multiple FASTA format. The text file has been compressed using gzip. To uncompress the file:
    2.  	    gunzip aspergillus_1.fasta.gz 	    
    3. Why does gunzip tell me the file is not in gzip format?
      Some browsers (like newer versions on Netscape) automatically unzip files after download. If this is the case, the file should be 30MB (rather than 9.3MB of the compressed file). You can just rename the file to remove the .gz suffix.
    4. The download fails. What should I do?
      Downloading through the browser uses the http protocol. You can also try accessing the ftp site directly via the URL:

  • BLASTing
    1. Why is my BLAST job taking so long?
      BLAST jobs are queued and handled with other internal Broad processes in a general Load Sharing Facility. The delay for receiving your BLAST results depends on the current load.
    2. Why are my BLAST results split into multiple email messages?
      Some email programs are configured with a maximum message size and will automatically split large files into smaller pieces. If this is undesirable, you will need to reconfigure your email program.

    3. What sequences can I BLAST against?
      You can BLAST your query sequence against our entire assembly or special sequences set excluded from the assembly.

    4. Why do I get the message "ERROR: BLASTSetUpSearch: Unable to calculate Karlin-Altschul params, check query sequence"?

      From the NCBI Blast FAQ:

      This will happen if your entire query sequence has been masked by low complexity filtering. You will need to turn filtering off to get hits. For further information on filtering, please read the sections of the BLAST FAQs on Q: What is low-complexity sequence? and also Q: After running a search why do I see a string of "X"s (or "N"s) in my query sequence that I did not put there?

    5. After running a search why do I see a string of "X"s (or "N"s) in my query sequence that I did not put there?
      From the NCBI Blast FAQ:
      You are seeing the result of automatic filtering of your query for low-complexity sequence that is performed to prevent artifactual hits. The filter substitutes any low-complexity sequence that it finds with the letter "N" in nucleotide sequence (e.g., "NNNNNNNNNNNNN") or the letter "X" in protein sequences (e.g., "XXXXXXXXX"). Low-complexity regions can result in high scores that reflect compositional bias rather than significant position-by-position alignment (Wootton & Federhen, 1996). Filter programs can eliminate these potentially confounding matches from the blast reports, leaving regions whose BLAST statistics reflect the specificity of their parities alignment. Queries searched with the blastn program are filtered with DUST. The other BLAST programs use SEG.

    6. What is low-complexity sequence?
      From the NCBI Blast FAQ:
      Regions with low-complexity sequence have an unusual composition and this can create problems in sequence similarity searching (Wootton & Federhen, 1996). Low-complexity sequence can often be recognized by visual inspection. For example, the protein sequence PPCDPPPPPKDKKKKDDGPP has low complexity and so does the nucleotide sequence AAATAAAAAAAATAAAAAAT. Filters are used to remove low-complexity sequence because it can cause artifactual hits (please also see Q: After running a search why do I see a string of "X"s (or "N"s) in my query sequence that I did not put there?)

      In BLAST searches performed without a filter, often certain hits will be reported with high scores only because of the presence of a low-complexity region. Most often, this type of match cannot be thought of as the result of homology shared by the sequences. Rather, it is as if the low-complexity region is "sticky" and is pulling out many sequences that are not truly related.

  • Genome Browser
    1. Does the Genome Browser Java applet run on Macintosh computers?
      We are pleased to announce that the Genome Browser can now run on both Windows and Macintosh platforms.

      Requirements for Windows:
      Windows 9x & NT platforms or better
      Java 1.4
      Netscape Navigator 4+, Internet Explorer 5+, Mozilla 1.* or other browser that can display Java applets

      Requirements for Macintosh:
      OS X
      Java 1.4 (Software Update)
      Safari

  • Misc
    1. What's the Broad Institute?
      The Eli and Edythe L. Broad Institute is a partnership among MIT, Harvard and affiliated hospitals and the Whitehead Institute for Biomedical Research. Its mission is to create the tools for genomic medicine and make them freely available to the world and to pioneer their application to the study and treatment of disease.

    2. How do I cite the sequence for publication?
      Publications should include the following citation:
      Aspergillus Sequencing Project. Broad Institute of MIT and Harvard (http://www.broad.mit.edu)

    3. Who do I contact with questions about the sequencing?
      For additional help or to send feedback about the website, please email annotation-webmaster@broad.mit.edu.
    4. Ralph Cavaliere, Gettysburg College
    5. David Ellis, Mycology Online
    6. Ralph Cavaliere, Gettysburg College
    7. David Geiser, The Pennsylvania State University, from the pages at The Mycological Society of America

The color image on the project information come courtesy of Ron Morris, University of Medicine and Dentistry of NJ.

If you have any interesting photos of A. nidulans that you'd like to share, please let us know at annotation-webmaster@broad.mit.edu.

-->