Tag Archives: scaffolding

Another Great Ape Genome Sequenced

photography: Doris Lauscher

After the Humans, chimpanzees and orangutans genome, last week another great ape genome was reported as being sequenced and assembled in Nature: The Gorilla genome.

Gorillas that are in immediate danger of extinction are humans’ closest living relatives after chimpanzees, followed by orangutans. Therefore the genome of the gorilla represents the missing piece of the puzzle, to study the origin and evolution of the humans in much more detail. 

The comparison had indeed some surprises in store: It revealed that the gorilla and humans are more closely related to each other than assumed previously. The separation of both species took place approx. 10 million years ago. Approx.  4 million years after that the chimpanzees separated from the humans.

To gain a genome assembly with contigs and scaffolds long enough to allow those comparisons, the international research team, not only sequenced the genome with Illumina short read technology (167 Gbp) but included 5.4 Gbp of long read technology sequencing data in addition. Based on a genome size of approx. 3 Gbp the Sanger reads referred to a coverage of 1.8-fold. The initial assembly was produced with a de novo strategy but in later phases of the assembly the researchers made use of the human reference genome to improve the assembly.

For me personally, the assembly and scaffolding approach described is really impressive. A variety of software tools was used to integrate sequence data and paired-end information from different technologies as well as the similarity to the human genome to best use all the information available. Have a look at it!

Which sequencing strategy do you use for scaffolding of contigs?

In our latest poll that started mid of November 2011 we raised the question about your sequencing strategies for scaffolding projects. 29 ngs-expert.com readers did submit their votes.

39% of all votes agree my own opinion that LPE and LJD libraries are the preferred method for scaffolding of contigs. Long reads of up to 40 kbp can be easily and efficiently bridged.

But despite that, it is also obvious that all other techniques are still used for scaffolding projects. And I am still interested to see whether this might change with the new C2 chemistry for PacBio RS that is announced for Q1.

Why Choose LJD Libraries Rather Than Mate-Pair Libraries?

Why should you choose long jumping- distance (LJD) libraries rather than mate-pair libraries, especially for de novo sequencing projects? There is a simple answer to this question: Because the resulting data are much more suitable for de novo assembly and scaffolding.

Why is this so? Mate-pair libraries contain higher percentage of undesired inward-facing read-pairs (if you are not familiar with inward and outward facing reads, just take a look into our FAQs on this subject). These reads are not mate-pair (in other words they are shotgun paired-end). The portion of such reads in the LJD library is greatly reduced.

Furthermore, if using mate pair libraries, it is not clear, if (and if yes, where) there is the changeover within the resulting reads. In other words: A read may contain sequence from one AND the other end without knowing where the changeover is. As a result chimeric reads go into the assembly. This effect is almost completely eliminated if LJD libraries are used, because of the differences in library generation.

Which experiences did you gain with LJD or mate-pair libraries? I’d be happy to hear from you.