Ciona Assembly

Assembly methods will be described in more detail in the paper "Whole genome assembly of polymorphic genomes", currently in preparation.

Assembly method overview

We sequenced whole genome shotgun reads totalling approximatly 13x from a single heterozygous diploid individual.

The rate of substitutions between the two sequenced haplotypes averages about 5%, and the overall rate of differences including large insertions and deletions is at least 19%.

We were not able to assemble the two haplotypes together in one step. Therefore we assembled the haplotypes separately, into contigs and scaffolds, each of which are all from one haplotype or all from the other.

We then established long stretches of collinearity between scaffolds of opposite haplotype, and used this correspondence both to detect and correct potential errors in the separate haplotype assembly and as the glue for forming larger structures called paired scaffold assemblies, or PSA's.

We present an assembly of the Ciona savignyi genome in 446 PSAs totaling 164Mb, with an N50 size of 1.05Mb and an N50 contig size of 47Kb. This first tier of the assembly contains most of the estimated 180Mb Ciona savignyi genome, with no genomic region appearing more than once in the assembly.

Some regions of the separate haplotype assembly could not be assigned a partner and hence could not be used to form paired scaffold assemblies. These regions total 105Mb. They are highly enriched for repetitive sequence, with 75% masked by RepeatMasker using a Ciona specific library, compared to 35% for the assembly overall.

More soon.

Overview of Whole Genome Assembly in General

The Ciona genome was sequenced using the Whole Genome Shotgun methodology, whereby:
  1. Ciona DNA is shattered into small fragments (~4Kb or ~40Kb)
  2. Each fragment is inserted into a vector and cloned
  3. The two ends of the fragment are sequenced, creating paired reads
  4. The assembly process uses the paired reads to identify contiguous stretches of sequence (contigs)
  5. Contigs are ordered and linked together into larger supercontigs by using paired reads lying in different contigs