de-novo-sequencingDressing up by pulling carbon dioxide out of the water – this is speciality of the coccolithophore Emiliania huxleyi. Using carbon dioxide E. huxleyi makes microscopic disks of calcite, with which it clothes itself (about.com). These carbon fixation makes up for ~ 20% of carbon fixation in some systems, which is really impressive. Read an her colleagues used one strain from the South Pacific to investigate the global distribution and the heterogeneity of the genome of this coccolithophore (Read et. al). Amongst others they could reveal that “this organism is unusually diverse and has a huge genome with a large number “optional” genes. This kind of “pan genome” has not previously be found outside the bacteria” (Alden, about.com)

What was sequenced?

A batch culture of the diploid strain Emiliania huxleyi CCMP1516 from the South Pacific

Sequencing strategy: Whole genome sequencing

  1. Libraries: 3 libraries (insert sizes: 3 kbp, 8 kbp, 20-40 kbp). The majority was sequenced using the ABI 3730 XL
  2. Read output: 3,910,095 whole genome shotgun reads (10x coverage)
  3. Data output: 6,995 scaffolds of the final nuclear genome (excluding mitochondrial, chloroplast and eukaryotic scaffolds), where 321 large scaffolds harbor 70% of the total sequence
  4. Bioinformatics: Analysis of prokaryotic only scaffolds with total lengths greater than 100 kb -> Genome assembly with Arachne
    Note: All contigs and scaffolds < 4 kb in length were excluded from the final assembly due to the high GC content (65%) and large amount of repetitive region in E. huxleyi

2nd whole genome sequencing approach:

  1. Libraries: 13 shotgun libraries for 13 different strains using Illumina HiSeq sequencing (3 strains deeply sequenced and 10 strains moderately sequenced)
  2. Read output: ~ 36 x 109 reads per strain (strain 1-3) -> 265-352x coverage   and ~ 27 x 106 reads per strain (strain 4-13) -> 14-29x coverage
  3. Data output: total scaffold lengths: 98-117 Mb (strain 1-3) & 49 – 76.5 Mb (strain 4-13)
  4. Bioinformatics: De novo genome analysis using CLC Genomics & BLASTn for comparison of the deeply sequenced strains

Sequencing strategy: Transcriptome analysis

  1. Libraries: 4 cDNA libraries corresponding to different development stages and growth conditions were prepared and sequenced using the ABI 3730
  2. Data output (filtered): 30,569 genes  (these genes cover 40% of the genome)
  3. Bioinformatics: Genome annotation and alignment using BLAST and BLAT

I think one of the most interesting facts from this study is that they used Sanger sequencing for a great part of this project. According to their comparisons with for example the Illumina data, the scaffold completeness of the sanger data is estimated at 96%. And although it seems that also sanger sequencing might be suitable for small genomes for me the question remains if a hybrid NGS consisting of Roche GS FLX++ and Illumina HiSeq might have lighten up the project.

Read the complete publication here.

