Archive | January, 2013

Summary from 4th Next Generation Sequencing Congress 2012 – Part 2

Dear all,

Here is my second summary from 4th NGS Congress at London Heathrow end of 2012. It will bring to you some (hopefully) interesting new facts about sequencing with PacBio RS - the second long read technology present in the actual markets and also the only system delivering reads even longer than 10,000 bp…

Kevin Corcoran, Senior Vice President at Pacific Biosciences held an interesting and very nice talk about the most recent developments for the PacBio RS system. He also showed some nice detailed road maps about future aims and plans. One important thing actual to be mentioned is the launch of the new “XL Chemistry” – while still “C2 Chemistry” may be used as well. The other very interesting story is about “Stage Start” a new feature enabling a parallel start of all sequencing detection similar to the well-known “hot start” technology for PCR. Such detection of sequences better will start from a defined position for most of the libraries than starting from somewhere in the middle. Last but not least, I’m very keen to learn how the future “Photo Protected DNA Polymerases” may further develop – an idea being really very, very next-next-generation…

First of all I can summarize that applying “XL Chemistry” looks really interesting and this being true also in terms of Eurofins MWG Operon de novo sequencing and assembly focus.  This new feature of the PacBio RS machine may also open some new doors to other types of applications, while in general the need for extrem high data coverages may be reduced in parallel.

Currently “C2 Chemistry” is on the machine and running a 90 min video may deliver you about 20-50,000 reads and data outputs of 30-50 Mb – of course higher yields may be possible for “ideal” DNA samples. The average read length is about 3,000 bp (!), while the 95% percentile is about 8,000bp. With the new ”XL Chemistry” we got an average yield of about 40,000 reads per SMRT cell with an average read length of about 4,000bp (+30%). Overall, we are very pleased with these first results, especially since we see some good potential to further increase data yields using the new software pipeline started in parallel (Hierarchical Genome Assembly Process and Quiver).

— See picture 1: —

Kevin-Corcoran-Pacific-Biosciences_Seite_16

 

It is also important to mention two different ways of “How-to-deal” with the XL Chemistry. 1) ”XL chemistry for Polymerase binding”, but “C2 chemistry for sequencing”. This allows for longer reads at the same quality (currently we still do have a single error rate of 10% to 20 %, average maybe 15%). 2) “XL chemistry for Polymerase binding” AND ”XL chemistry for sequencing”. Such one can yield even longer reads, but unfortunately the error rate will also increase by a few %. Therefore this method is being recommended especially for de novo assembly or finishing genomes.

— See picture 2: —

Kevin-Corcoran-Pacific-Biosciences_Seite_52

 

Finally one real “next-next-gen” highlight was the presentation of a development at Pacific Biosciences scoping with the idea to protect the polymerase enzyme from being killed by the energy of the laser. A picture shows how this should work in principle - by setting in place a laser-light protecting sun-blocker - this story was really fascinating for me and I hope to see in future more than the very promising first data results …

— See picture 3: —

Kevin-Corcoran-Pacific-Biosciences_Seite_56

 

So over all Pacific BioSciences keeps also moving very fast in year 2013 and it will be very nice to see and learn how all these additional improvements and new features may  improve the overall data results of this fascinating very long read technology offering today real single reads longer than 10,000 bp.

Cheers now and see you on our next BLOG,
Axel

Goat Genome Sequenced Using Whole Genome Mapping

Domestication of goats happened already thousands of years ago. Nowadays they are also used as models for biomedical research. However, one thing was still missing: a reference genome. Researchers from China could now close this gap by successfully sequencing the genome of a domestic goat.

To reveal the secrets of the goat genome the researchers applied a hybrid approach of Illumina shotgun sequencing and whole genome mapping (WGM) using the Argus system from Opgen. As a result, the number of scaffolds could be reduced from 2,090 to 315. This demonstrates that whole-genome mapping for large genomes can be a replacement for traditional genetic maps for de novo assembly (Dong et. al).

This reference genome can now be used for mapping reads of other goats to identify SNPs and other variants that could play a role for breeding, cashmere fiber prodcution or different goat behaviours (Dong et. al).

If you are interested in more information about optical mapping, read our dedicated blog posts: What is optical mapping? and Creating the perfect genome assembly.

PacBio Sequencing Without Library Preparation

Researchers of the Wellcome Trust Sanger Institute have reported DNA sequencing on the PacBio RS sequencer without prior library preparation. As described in an article in BioTechniques last month, the method has so far been applied for sequencing single- and double-stranded viral genomes, bacterial plasmids, plasmid vector models for DNA modification analysis, as well as linear DNA fragments covering an entire bacterial genome.

The standard library preparation step was skipped and the DNA was directly used in the sequencing reaction. With this approach, the researchers around first author Paul Coupland were able to generate sequencing data with as little as 1 ng of starting material, taking only about 8 hours of time.

“In terms of read length and accuracy, the direct sequencing method is comparable to the standard sequencing protocol on the PacBio”, as Coupland told InSequence. “There are no drawbacks in terms of read length and accuracy because PacBio is already single molecule sequencing, so it’s just skipping the library prep and going straight into the sequencing part.”

Since random hexamers can be applied as sequencing primers, and no growth of organisms is needed during sample preparation, the method can be applied without any a priori information on the organisms in the sample.

Clearly, this technique still needs to be optimised. For example, the sequence yield obtained with this approach is considerably lower than with standard methods (3,000 reads per SMRT cell, in contrast to 35,000 to 50,000 reads for standard methods).

However, the authors think that the technique has great potential for clinical applications, where unknown organisms need to be quickly identified. As Dr. Harold Swerdlow, lead author from the Wellcome Trust Sanger Institute says in their press release: “Our technique can be performed without any prior knowledge of the sequence and with no organism specific reagents, in a short space of time. This makes it a promising alternative for clinical situations such as infection control.”

What Is In Your Genes?

Watch out the presentation of the SITN Boston talking about whole genome sequencing and its impact on personalised medicine.

Further recorded lectures given by graduate students at Harvard and focusing on hot topics in science research and news can be found at https://sitn.hms.harvard.edu/seminar-archive-2012/. Enjoy!

New Bid From Roche For Illumina?

The analyst and sequencing community is currently divided on whether to believe the rumors of a new bid from Roche to buy Illumina. The source of the controversial discussions is an article from the Swiss Newspaper L’Agefi that reported end of December that Roche and Illumina might have agreed to a deal for Roche to acquire Illumina. Since Illumina turned down Roche’s original bid in January, continuous interest from Roche has been reported several times, but the report from L’Agefi is also mentioning concrete amounts of the bid. According to them, the acquisition might take place for $66 per share, valuing the deal at about $8.14 billion in total.

The offer is $15 per share higher than the previous offer of $51 in April last year. According to the analyst Devia Ferreiro of Oppenheimer the new bid is definitely at a level that might lead to a final deal.

With Roche having only about 9% of the NGS market and next generation sequencing becoming most likely an important clinical diagnostic tool in the next years, the strategy focus of Roche must be to get better access to the NGS market and to take NGS to clinical practice. The acquisition of the NGS market leader Illumina represents an optimal starting point.

We’ll see if the rumors are built on a solid foundation within the next two weeks: The Swiss newspaper L’Agefi reported that the announcement might come during the first half of January.

Hybrid De Novo Genome Assemblies

What are your intentions when being interested in a bacterial or fungal de novo genome sequencing project?

Typical answers we get from our customers:

  • Easy working with the data
  • Data suitable for high quality annotation
  • Resolution of structural rearrangements
  • High consensus accuracy
  • High cost-efficiency

All these requirements can be fulfilled perfectly when combining Roche GS FLX++ and Illumina data. The long Roche FLX++ reads of up to 1100 bp give much longer contigs than Illumina reads only do. For scaffolding and to be able to resolve structural rearrangements we sequence shotgun (SG) and LJD libraries with Illumina technology. The adding of Illumina reads keeps the overall costs at a reasonable level. Furthermore the reads correct the Roche sequencing errors at homopolymer sites and therefore enable us to build a consensus sequence with high accuracy.

The superiority of such a hybrid assembly becomes quickly apparent when looking at the following results of one of our proof of concept studies. In this de novo project, we sequenced a fungal genome of about 30 Mbp and approx. 57% GC content. Using the hybrid strategy we obtained only 10 chromosome-sized scaffolds (see figure below) with up to 8.3 Mbp. Remarkably, the 10 scaffolds represent the majority of genetic information present, given that they make up 99.6% of all scaffold sequence information.

Such results enable easy data handling and definitely are an excellent starting point for annotation and studying of gene content and rearrangements.

Sequencing strategy: SG library with FLX++ (approx. 10-fold coverage), SG and LJD 3 kbp, 8 kbp and 20 kbp on Illumina HiSeq 2000 with 2x 100 bp module.