Hybrid De Novo Genome Assemblies

What are your intentions when being interested in a bacterial or fungal de novo genome sequencing project?

Typical answers we get from our customers:

  • Easy working with the data
  • Data suitable for high quality annotation
  • Resolution of structural rearrangements
  • High consensus accuracy
  • High cost-efficiency

All these requirements can be fulfilled perfectly when combining Roche GS FLX++ and Illumina data. The long Roche FLX++ reads of up to 1100 bp give much longer contigs than Illumina reads only do. For scaffolding and to be able to resolve structural rearrangements we sequence shotgun (SG) and LJD libraries with Illumina technology. The adding of Illumina reads keeps the overall costs at a reasonable level. Furthermore the reads correct the Roche sequencing errors at homopolymer sites and therefore enable us to build a consensus sequence with high accuracy.

The superiority of such a hybrid assembly becomes quickly apparent when looking at the following results of one of our proof of concept studies. In this de novo project, we sequenced a fungal genome of about 30 Mbp and approx. 57% GC content. Using the hybrid strategy we obtained only 10 chromosome-sized scaffolds (see figure below) with up to 8.3 Mbp. Remarkably, the 10 scaffolds represent the majority of genetic information present, given that they make up 99.6% of all scaffold sequence information.

Such results enable easy data handling and definitely are an excellent starting point for annotation and studying of gene content and rearrangements.

Sequencing strategy: SG library with FLX++ (approx. 10-fold coverage), SG and LJD 3 kbp, 8 kbp and 20 kbp on Illumina HiSeq 2000 with 2x 100 bp module.

 

Tip: Inside The Wellcome Trust Sanger Institute

Do you know the blog of the Wellcome Trust Sanger Institute?

The Wellcome Trust Sanger Institute is one of the leading genomic research centres in Europe and a leader in the Human Genome Project. Within their blog they are talking about the role of genetics in health and disease by using the latest genomic and genetic techniques.

Read more at http://sangerinstitute.wordpress.com/

Seasonal Greetings

 

Think Big: American Gut Project Based On NGS

Scientists estimate that the cells of our bodies are outnumbered 10 to 1 by bacterial cells which live in or on our body.  A previous blog has already pointed out the impact of this fact on sequencing the corresponding host genomes. On the other hand, microbiomes have the potential to play an important role as diagnostic markers, or opening up new ways of treating diseases, such as personalized medicine.

However, we are just beginning to understand the complex relationships of this “social network”, as the Scientific American has called it. The most complex bacterial community within the human body resides inside the gut. In order to obtain a deeper understanding of the bacterial communities of the human gut, there have been several attempts of sequencing the gut microbiomes of larger groups of individuals, such as projects by Arumugam et al., Yatsunenko et al or Schloissnig et al. However, so far, the number of individuals which were analyzed was relatively small (up to several hundreds).

A group of US scientists have now started the “American Gut Project“.  As reported by Genome Web News, this project is planned as a crowd-sourcing study of 10.000 or more individuals in the US. Since this study is part of the “American Food Project”, it will mainly focus on gut microbiome patterns in relation to diet, age and lifestyle. People who would like to participate in this study need to sign up via a website and donate $99. This money will be used to cover a significant part of the cost of the study. In return, participants will receive a taxonomic profile of their gut microbiome.

The analysis itself will be based on 16S sequencing. For part of the samples, additional analyses such as sequencing the complete metagenomes and long term surveys are planned. No doubt, this study will clearly provide us with a huge data set. However, this data set will be highly complex. Also, it still needs to be brought in context with data from other projects.  To my opinion, interpretation of the data still remains the hardest part. Or, as project organizer Jeff Leach has put it in an interview with Genome Web Daily News: “We don’t expect to be able to address some questions, but because of the size of the sample and because of the broad patterns we expect to see in diet and lifestyle, we think some stuff will fall out.”

Adventitious Virus Testing Via Next Generation Sequencing

Adventitious viruses are a major safety concern in biological products. For a substance to be considered “free” of an adventitious agent, assays must demonstrate that a defined quantity of the biological product is negative for an agent at a defined level of sensitivity. In vivo animal testing, in vitro cell culture testing, transmission electron microscopy and molecular assays like quantitative PCR (qPCR) are the current gold standards for viral safety testing. However, if for example the cell substrate contains potential contaminating agents coming from a tumor derived cell line, then current standard methods need to be supplemented by using novel technologies.

Deep sequencing approaches via the next generation sequencing (NGS) techniques may be the method of choice. They allow the detection not only of known viruses but also of unknown viruses or viral subspecies at the detection limit of qPCR-based methods. On the Pathogen Safety Summit (Munich (Germany), November 27-28, 2012) the application of NGS testing approaches were introduced and intensely discussed. The application of NGS into routine testing of production cell banks is presently evaluated by several biological and vaccine producing companies.

Currently, NGS is used for initial characterisation of cell banks, but it iss expected that this new technology will become a standard method for adventitious agent testing soon. There are still challenges that need to be overcome with regard to bioinformatic analyses as well as to the speed of the technological development. Furthermore, also the biological relevance of the NGS data needs to be confirmed. In this regard the expectation is that with the ability to purify active viral particles and subject them to NGS analysis this problem can be overcome.

Btw: Eurofins Medigenomix offers the detection of adventitious viruses in biologicals and biotechnological products by next generation sequencing on platforms from Illumina and Roche 454.

Summary from 4th Next Generation Sequencing Congress 2012

Attending the 4th NGS Congress 2012 at London Heathrow I can give here some interesting new facts and information about latest NGS stories which are worth to be shared.

First of all let’s talk about “long read technology” – A Roche 454 talk has been given by Todd Arnold, Vice President R&D, Roche 454.  For Roche GS Junior a new software version 2.7, with  “improved well resolution results in better quality, more robust sequencing runs”  is now available.  As a matter of fact we can confirm these new data outputs while using on our own Junior platform with this update since a while.  Depending on your samples nature  a good part of all reads will be longer than 400 bp and up to 450-480 bp (still using the Titanium Chemistry). But the FLX+ technology is NOT available and also NOT planned for GS Junior - raising the question why,  no concret details or upgrade plans could be given for GS Junior at the London congress…

The real and major highlight about Roche 454 was the description of what we call now “FLX++” sequencing. A software update (2.8) being available now for all the GS FLX systems – together with  the “pimped chemsitry kits” – Roche 454 is offering real ”1000bp” Sanger-like reads (as initially aimed at launch).  Some data outputs and slides were shown that demonstrate these new and longer read lengths and also higher data outputs (figure 1). All together that counts up to almost ~1Gb of sequencing data per full PPT run.

Fig 1: Todd Arnold Roche 454 Data Heathrow 2012

Being one of the early access users of the FLX++ upgrades and software version 2.8, we can in fact confirm that the new data outputs are excellent (again depending on the quality of DNA) - in fact one can reach even better results than shown by Roche at the 4th NGS congress in London Heathrow. Here is an example:

Fig 2: Eurofins MWG Operon data with Roche GS FLX++

Of course one may argue now – “that’s nothing compared to Illumina data outputs” – and you are right in terms of the pure data volumes! But the focus here is on long read applications like e.g. sequencing and de novo assembly. And for this kind of NGS application, a modal read length of 800-950 bp or above will tune the final data outputs treamendously. You won’t believe? We can share with you some nice new project data that we have delivered for a fungal de novo sequencing project (figure 2). We were able to deliver chromosome-size scaffolds of 8.3 Mb, 6.0 Mb, 4.3 Mb, 2.8 Mb, 2.4Mb, 2.1 Mb, … when using a long read FLX++ back-bone sequencing at  8x-12x only and combining this data with short read LJD sequencing on HiSeq at 2x 100 bp. The complete data set missed only about 0.5% of all genetic information, while remaining average gap lenght was about 240 bp.  We are actually very interested to learn how 2x 250 bp read length on MiSeq will further improve this excellent data results – one shot genome sequencing at it’s best.

Interested in this kind of project data? Please learn more about our fascinating de novo sequencing & assembly results at our next NGS roadshow in 2013 or send me an email for further discussion about this topic…

How to benefit from our superior LJD’s on the MiSeq

With the update of our MiSeq system to 250 bp reads genome sequencing on this system gets even more important. But long reads and huge data output are not the only prerequisite for a great de novo assembly result.

What is missing?

Paired-end libraries that span gaps and repetitive structures can improve de novo genome assemblies tremendously. Our proprietary long jumping distance libraries (LJDs) are perfectly suited for scaffolding on Illumina sequencing devices. In contrast to other paired-end libraries (like Illumina mate pair library), our LJD library preparation involves an adaptor-guided ligation of the genomic fragments. The different preparation protocol offers the following advantages:

  • No hybrid reads – a unique sequence identifies the crossover points
  • No shotgun pairs – less than 1% of all LJD reads are shotgun paired-end reads
  • Distinct insert sizes – we prepare LJDs with 3, 8, 20 or even 40 kbp insert size
  • Span large repeats – large and complex repeats up to 40 kbp can be resolved

Mapped reads: All reads from a 3 kbp LJD library (grey) are aligned to a reference sequence. Two LJD read pairs are highlighted (blue + black) and their measured insert size is 3107 bp and 3002 bp respectively.

 

Why should I combine MiSeq long reads and LJDs?

The new features of the MiSeq (250 bp reads; data output up to 8 Gbp) enable the combined and cost-efficient approach of shotgun and LJD libraries in one run. The MiSeq output is sufficient to sequence several bacterial genomes or single fungal genomes (up to 60 Mbp) with appropriate coverage.

  • Longer reads – more sequence information to correctly map the reads onto your contigs
  • Short delivery time – due to the shorter run time compared to the HiSeq 2000

Read more about our long jumping distance libraries on our website

150 bp, 250 bp and next year 300 bp:
Illumina keeps the competition on the go

Illumina is currently in the midst of the MiSeq sequencer updates. The software update, the new flowcells and the new sequencing chemistry enable runs with outputs of around 8 Gbp and 250 bp read length. The first updates have reached Europe just recently and only a few days ago our own MiSeq has received the update.

That’s not the end of the story for Illumina. Just a week ago, they already have announced the next update. In the second half of 2013 Illumina is planning to offer another MiSeq update that will increase the output to 15 Gbp. They achieve this tremendous output for their benchtop device by increasing the read length to 300 bp and resolving about 25 million clusters on the flowcell.

Considering the intense competition with Life Tech’s Proton and Ion Torrent sequencer, Illumina needs to steadily improve the specs of their sequencing devices. In March, Life Tech plans to increase the output of their Proton sequencer to around 36 Gbp. That’s still a bit more than the new MiSeq upgrade can deliver, but one also has to evaluate the differences in the read length. While the MiSeq will be able to produce 300 bp reads soon thereafter, the Ion Proton is generating reads from 100 to 150 bp. And the difference is even more remarkably when the sequencing on the MiSeq is performed with the paired-end module – an approach that is not possible with Life Techs devices. By using library insert sizes of around 450 – 500 bp, the two overlapping reads can generate a single consensus read of about that size.

In my opinion the Illumina MiSeq is at the forefront of the race and if Illumina’s plan works out they will be there in 2013, too. But we all know how short-lived the NGS market is. So let’s see what’s coming!

Poll Result

We asked for your opinion if it is possible to directly compare the benchtop sequencers MiSeq and IonTorrent with each other. The result was pretty interesting because the 34 votes were distributed nearly evenly.

The NGS Expert Blog On Its Own Behalf

Since the NGS Expert Blog grew up, it is good to stop for a moment and reflect on the last 20 months since it went live. The NGS expert team at Eurofins MWG Operon spent a lot of time writing posts that interest you. From time to time it is important to streamline our work with your interests and needs. So, please share your feedback with us.

What posts interest you? Do you like the mobile version? What do you think about the frequency of the posts? Etc.

 

Just submit your vote (s. poll on the right hand side) or send me an email at carolagrimminger@eurofins.com. I am looking forward to your thoughts!