Tag Archives: hybrid assembly

Hybrid De Novo Genome Assemblies

What are your intentions when being interested in a bacterial or fungal de novo genome sequencing project?

Typical answers we get from our customers:

  • Easy working with the data
  • Data suitable for high quality annotation
  • Resolution of structural rearrangements
  • High consensus accuracy
  • High cost-efficiency

All these requirements can be fulfilled perfectly when combining Roche GS FLX++ and Illumina data. The long Roche FLX++ reads of up to 1100 bp give much longer contigs than Illumina reads only do. For scaffolding and to be able to resolve structural rearrangements we sequence shotgun (SG) and LJD libraries with Illumina technology. The adding of Illumina reads keeps the overall costs at a reasonable level. Furthermore the reads correct the Roche sequencing errors at homopolymer sites and therefore enable us to build a consensus sequence with high accuracy.

The superiority of such a hybrid assembly becomes quickly apparent when looking at the following results of one of our proof of concept studies. In this de novo project, we sequenced a fungal genome of about 30 Mbp and approx. 57% GC content. Using the hybrid strategy we obtained only 10 chromosome-sized scaffolds (see figure below) with up to 8.3 Mbp. Remarkably, the 10 scaffolds represent the majority of genetic information present, given that they make up 99.6% of all scaffold sequence information.

Such results enable easy data handling and definitely are an excellent starting point for annotation and studying of gene content and rearrangements.

Sequencing strategy: SG library with FLX++ (approx. 10-fold coverage), SG and LJD 3 kbp, 8 kbp and 20 kbp on Illumina HiSeq 2000 with 2x 100 bp module.

 

Another Great Ape Genome Sequenced

photography: Doris Lauscher

After the Humans, chimpanzees and orangutans genome, last week another great ape genome was reported as being sequenced and assembled in Nature: The Gorilla genome.

Gorillas that are in immediate danger of extinction are humans’ closest living relatives after chimpanzees, followed by orangutans. Therefore the genome of the gorilla represents the missing piece of the puzzle, to study the origin and evolution of the humans in much more detail. 

The comparison had indeed some surprises in store: It revealed that the gorilla and humans are more closely related to each other than assumed previously. The separation of both species took place approx. 10 million years ago. Approx.  4 million years after that the chimpanzees separated from the humans.

To gain a genome assembly with contigs and scaffolds long enough to allow those comparisons, the international research team, not only sequenced the genome with Illumina short read technology (167 Gbp) but included 5.4 Gbp of long read technology sequencing data in addition. Based on a genome size of approx. 3 Gbp the Sanger reads referred to a coverage of 1.8-fold. The initial assembly was produced with a de novo strategy but in later phases of the assembly the researchers made use of the human reference genome to improve the assembly.

For me personally, the assembly and scaffolding approach described is really impressive. A variety of software tools was used to integrate sequence data and paired-end information from different technologies as well as the similarity to the human genome to best use all the information available. Have a look at it!