Tag Archives: sanger sequencing

Whose genome has been sequenced? Brassica napus

de-novo-sequencingBrassicas napus, also known as oilseed rape, was formed more than 7000 years ago by allopolyploidy (chromosome doubling from to Brassicas species). Of course the genome mutated further and so it is known today that during this evolution some genes were preserved and further “improved” (e.g. oil biosynthesis genes), whereas others were lost over the course of time (e.g. glusoinolate genes).

Chalhoub et. al now sequenced the genome, because it can help to “provide insights into allopolyploid evolution and its relationship with crop domestication and improvement” (Chalhoub et. al).

What was sequenced?

Young fresh leaves from the Brassica napus French homzygous winter line “Darmor-bzh“.

Sequencing strategy: Whole genome sequencing

  1. Libraries & Sequencing:
    Roche GS FLX: ~ 70 Million reads, Average Read length: ~ 368 bp, Genome coverage: 21.2 %
    Sanger BAC Seq: 141k reads, Read length: 650 bp; Genome coverage: 0.1%
    Illumina HiSeq:  ~375 Million reads, Read length: 36, 76, 108 and 150 bp, Genome coverage: 53.9%
  2. Data output: 44.146 contigs and 20.702 scaffolds
  3. Results: A final assembly of 849.7 Mb (using SOAP and Newbler) with 89% nongapped sequences.

After genome assembly the genome was mapped to other species (e.g. B. rapa and B. oleracea) and this helped to find several interesting genes and gene variation that help to understand the complete evolution better.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Whose genome has been sequenced? Thunnus orientalis

de-novo-sequencingTalking about sealife everyone knows how sharks or whales look like or how they behave. Sadly, I think little is known about tuna. Tuna is more or less only known as delicous meal. So it’s all the more pleasant to see that the recent de novo genome sequencing approach of Nakamuar et. al aim to learn more about the predatory behaviour of tuna and not about breeding or cultiviation (Nakamuar et. al). With this genome sequencing project of Thunnus orientalis the scientists could prove that tuna harbors some unique tactics to catch their prey.

What was sequenced?

The diploid genome of a wild-caught male Pacific bluefin tuna (T. orientalis) was sequenced.

Sequencing strategy: Whole genome sequencing

  1. Hybrid approach: Roche 454 GS FLX Titanium & Illumina GAIIx
  2. Libraries: Shotgun & paired-end libraries on Roche 454 & paired-end libraries on Illumina GAIIx
  3. Read output: 31.9 million 454 reads, including 4.9 million long paired-end reads (11.9x coverage) & 229.7 million Illumina paired-end reads (43x coverage)
  4. Data output: 192,169 contigs (> 500 bp) that could be assembled in 16,802 scaffolds (> 2 kb), totaling 740.3 Mb (= 92.5% of the estimated genome size (~ 800 Mb))
  5. Bioinformatics: Roche 454 read assembly with Newbler (Version 2.5) followed by mapping of the paired-end Illumina reads with Bowtie (Version 0.12.7).
    Note: 7,259 nucleotide mismatches & 312,851 short InDel’s could be eliminated by mapping the Illumina reads onto the scaffolds by bwa (Version 0.5.9)

Sequencing strategy: Transcriptome analysis

  1. Libraries & Sequencing: Normalized cDNA libraries have been sequenced with the Roche 454 FLX Titanium Instrument
  2. Read output: 3.8 million 454 reads
  3. Data output: 5,741 full-length cDNA sequences
  4. Bioinformatics:Assembly was performed using Newbler (Version 2.5)

From the sequencing strategy point of view this publication shows again that the hybrid approach of the Roche 454 long read technology and the Illumina short read technology is one of the most used techniques for de novo genome sequencing (Hybrid assemblies).

From a scientific point of view this publication could show that tuna hs the most RH2 paralogs among studied fishes and that three of these genes are mutated compared to the others. And according to Nakamuar et. al these changes might be responsible for the great feature of tuna to detect blue-green contrasts and therefore to be able to measure the distance to prey in the blue-pelagic ocean.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Whose genome has been sequenced? Emiliania huxleyi

de-novo-sequencingDressing up by pulling carbon dioxide out of the water – this is speciality of the coccolithophore Emiliania huxleyi. Using carbon dioxide E. huxleyi makes microscopic disks of calcite, with which it clothes itself (about.com). These carbon fixation makes up for ~ 20% of carbon fixation in some systems, which is really impressive. Read an her colleagues used one strain from the South Pacific to investigate the global distribution and the heterogeneity of the genome of this coccolithophore (Read et. al). Amongst others they could reveal that “this organism is unusually diverse and has a huge genome with a large number “optional” genes. This kind of “pan genome” has not previously be found outside the bacteria” (Alden, about.com)

What was sequenced?

A batch culture of the diploid strain Emiliania huxleyi CCMP1516 from the South Pacific

Sequencing strategy: Whole genome sequencing

  1. Libraries: 3 libraries (insert sizes: 3 kbp, 8 kbp, 20-40 kbp). The majority was sequenced using the ABI 3730 XL
  2. Read output: 3,910,095 whole genome shotgun reads (10x coverage)
  3. Data output: 6,995 scaffolds of the final nuclear genome (excluding mitochondrial, chloroplast and eukaryotic scaffolds), where 321 large scaffolds harbor 70% of the total sequence
  4. Bioinformatics: Analysis of prokaryotic only scaffolds with total lengths greater than 100 kb -> Genome assembly with Arachne
    Note: All contigs and scaffolds < 4 kb in length were excluded from the final assembly due to the high GC content (65%) and large amount of repetitive region in E. huxleyi

2nd whole genome sequencing approach:

  1. Libraries: 13 shotgun libraries for 13 different strains using Illumina HiSeq sequencing (3 strains deeply sequenced and 10 strains moderately sequenced)
  2. Read output: ~ 36 x 109 reads per strain (strain 1-3) -> 265-352x coverage   and ~ 27 x 106 reads per strain (strain 4-13) -> 14-29x coverage
  3. Data output: total scaffold lengths: 98-117 Mb (strain 1-3) & 49 – 76.5 Mb (strain 4-13)
  4. Bioinformatics: De novo genome analysis using CLC Genomics & BLASTn for comparison of the deeply sequenced strains

Sequencing strategy: Transcriptome analysis

  1. Libraries: 4 cDNA libraries corresponding to different development stages and growth conditions were prepared and sequenced using the ABI 3730
  2. Data output (filtered): 30,569 genes  (these genes cover 40% of the genome)
  3. Bioinformatics: Genome annotation and alignment using BLAST and BLAT

I think one of the most interesting facts from this study is that they used Sanger sequencing for a great part of this project. According to their comparisons with for example the Illumina data, the scaffold completeness of the sanger data is estimated at 96%. And although it seems that also sanger sequencing might be suitable for small genomes for me the question remains if a hybrid NGS consisting of Roche GS FLX++ and Illumina HiSeq might have lighten up the project.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Amplicon Sequencing Strategy: What Is Your Technology Of Choice?

amplicon_sequencingWe asked for your favourite technology for amplicon sequencing.

Please find the results here:

  • The majority (41 people) voted for Illumina MiSeq due to the data output
  • 29 prefer Ion Torrent for amplicon sequencing
  • 24 favour Roche 454 because of the long reads
  • 6 people say that classical Sanger sequencing is their technology of choice
  • Just 4 are using other technologies

104 NGS experts took part in the voting.

 

NGS Moving Towards Forensics

To me the article “Early Adopters Say NGS-based Forensic Testing Could Lead to More Precise Identification” by GenomeWeb (http://bit.ly/WKKIPC) is a very good sign of the “maturation” of the rather young next generation technology. It may find its way into criminal police offices soon.

Currently, forensic profiling is based on STR analysis with capillary electrophoresis, or analysis of mitochondrial DNA with PCR and Sanger sequencing. But next gen sequencing opens up new possibilities. Researchers are now looking at moving both STR profiling and mitochondrial DNA analysis to next gen sequencing, but are also looking to develop SNP-based targeted sequencing panels for forensics, which would enable researchers to identify a person’s ancestry, hair color, or other defining characteristics based on their DNA.

So, what do you think: Will there still be classical DNA profiling or will NGS have taken over in the next decade?