Tag Archives: PacBio

News From The World Of NGS

Pacific Bioscience’s instrument PacBio RS is known as the third generation sequencing technology. And again they proof their innovative character. During the course of the last couple of months two new chemistry packages have been released (XL & P4). And during the last couple of days two even more interesting news were spread:

1. Roche and PacBio’s signed an agreement for codeveloping diagnostic products for the PacBio RS instrument (Genome Web). From my point of view this is a huge signal. Roche, as an experienced player in the NGS market with its own sequencing instruments see’s a lot of potential in the SMRT technology. So the PacBio RS system obviously got out of the teething phase and will increase its importance in the NGS business in the coming months.

2. And also New England Biolabs a big player in the area of enzyme production, proteomics and drug discovery is using the PacBio RS to study bacterial methylomes and work on new reagents for 5-mc detection. And the CSO from NEB highlights that they have choosen “the PacBio system to study bacterial methylomes because of a unique feature of SMRT sequencing that enables the detection of base modifications through the system’s kinetics”. (Genome Web)

Beside the great news for Pacific Biosciences als Life Technologies or better Thermo Fisher signed a great deal with the Chinese Dx firm iGenomics to install 32 Ion Proton sequences in 2013 (Genome Web).

And clearly all developments focus on Molecular Diagnotics and Clinical Diagnostics. And to add the missing link in this news update also Illumina recently announced that they partner with G3 to identify novel biomarkers and pathways in cardiovascular disease.

Ion Torrent or Illumina?

If you are choosing between Illumina and Ion Torrent for sequencing of your genome of interest, a study published recently by the Broad Institute (Ross et al 2013 Genome Biology 14:R51) may be of interest to you.

The study compares Illumina MiSeq, Ion Torrent PGM and PacBio on sequencing bias in regions with extreme GS content (<10% and >75%) or long AT dinucleotides in three different bacterial genomes. Relative coverage by each technology is lower in all of these difficult regions, but coverage bias was found to be the most pronounced in Ion Torrent PGM data. PacBio demonstrates the least coverage bias, likely because of its amplication-free protocol, but a much higher error rate than the other two platforms was observed. The results are consistent with an earlier study that also compared those same sequencing platforms (Quail et al 2012 BMC Genomics. 13: 341).

Therefore, depending on the characteristics of your genome of interest, your choice of sequencing platform will influence your downstream analyses.

Summary from 4th Next Generation Sequencing Congress 2012 – Part 2

Dear all,

Here is my second summary from 4th NGS Congress at London Heathrow end of 2012. It will bring to you some (hopefully) interesting new facts about sequencing with PacBio RS – the second long read technology present in the actual markets and also the only system delivering reads even longer than 10,000 bp…

Kevin Corcoran, Senior Vice President at Pacific Biosciences held an interesting and very nice talk about the most recent developments for the PacBio RS system. He also showed some nice detailed road maps about future aims and plans. One important thing actual to be mentioned is the launch of the new “XL Chemistry” – while still “C2 Chemistry” may be used as well. The other very interesting story is about “Stage Start” a new feature enabling a parallel start of all sequencing detection similar to the well-known “hot start” technology for PCR. Such detection of sequences better will start from a defined position for most of the libraries than starting from somewhere in the middle. Last but not least, I’m very keen to learn how the future “Photo Protected DNA Polymerases” may further develop – an idea being really very, very next-next-generation…

First of all I can summarize that applying “XL Chemistry” looks really interesting and this being true also in terms of Eurofins MWG Operon de novo sequencing and assembly focus.  This new feature of the PacBio RS machine may also open some new doors to other types of applications, while in general the need for extrem high data coverages may be reduced in parallel.

Currently “C2 Chemistry” is on the machine and running a 90 min video may deliver you about 20-50,000 reads and data outputs of 30-50 Mb – of course higher yields may be possible for “ideal” DNA samples. The average read length is about 3,000 bp (!), while the 95% percentile is about 8,000bp. With the new “XL Chemistry” we got an average yield of about 40,000 reads per SMRT cell with an average read length of about 4,000bp (+30%). Overall, we are very pleased with these first results, especially since we see some good potential to further increase data yields using the new software pipeline started in parallel (Hierarchical Genome Assembly Process and Quiver).

— See picture 1: —



It is also important to mention two different ways of “How-to-deal” with the XL Chemistry. 1) “XL chemistry for Polymerase binding”, but “C2 chemistry for sequencing”. This allows for longer reads at the same quality (currently we still do have a single error rate of 10% to 20 %, average maybe 15%). 2) “XL chemistry for Polymerase binding” AND “XL chemistry for sequencing”. Such one can yield even longer reads, but unfortunately the error rate will also increase by a few %. Therefore this method is being recommended especially for de novo assembly or finishing genomes.

— See picture 2: —



Finally one real “next-next-gen” highlight was the presentation of a development at Pacific Biosciences scoping with the idea to protect the polymerase enzyme from being killed by the energy of the laser. A picture shows how this should work in principle – by setting in place a laser-light protecting sun-blocker – this story was really fascinating for me and I hope to see in future more than the very promising first data results …

— See picture 3: —



So over all Pacific BioSciences keeps also moving very fast in year 2013 and it will be very nice to see and learn how all these additional improvements and new features may  improve the overall data results of this fascinating very long read technology offering today real single reads longer than 10,000 bp.

Cheers now and see you on our next BLOG,

Further Improvements of PacBio Technology

Recently, we have reported on the Studies of the Broad Institute, showing that the PacBio RS system was able to outdo MiSeq sequencing regarding validation of SNP analysis. Now Pacific Biosciences have taken another important step to further improve their product.

Pacific Biosciences have now launched a new Sample Loading Device for the PacBio RS, called MagBead Station.  As  Michael Hunkapiller, Ph.D. President and Chief Executive Officer of Pacific Biosciences told in their press release, they expect that with the new device, customers will  “be able to generate 10 kilobase-sized libraries using as little as one microgram of sample, a five to 10-fold improvement from where we were just a few months ago”. Also, because the new process is more robust, they expect that sequencing results will have higher overall consistency, allowing to run experiments also on challenging samples.

First experiences of early-access-customers seem to underline these expections:

As Patrick Hurban of Expression Analysis told InSequence, the new loading device allowed them to recover sequences also for “difficult” samples: “we’re much more confident on a sample-by-sample basis that we will be able to get good sequence”, he said. Also, they could confirm that the amount of library that needs to be loaded is now significantly lower. The new loading process also seems to favor longer DNA fragments over shorter ones, excluding short contaminating DNA fragments. This results in a greater percentage of long reads in a run. Also, the loading process now seems to work as efficiently for the large insert libraries as it does for the smaller insert libraries.

With the new loading device, about 50-60 % of the ZMWs are now active after loading. This is a great improvement compared to 30-45 % of active ZMWs before the upgrade.

When PacBio started on the market, I was impressed by the sophisticated new technology. However, the results of the first projects were rather disappointing. The new loading device now seems to greatly improve the sample loading step. However, the high error rates still remain a challenge, with about 15% for the time being. Pac Bio will need to solve those issues if they want to be successful on the market in the long run. However, it seems that by and by, PacBio is overcoming  its “childhood diseases”.

PacBio RS Data to Validate SNPs Called from Illumina Sequencing?

Would you have thought that PacBio RS sequences with about 15% single read error rate can outdo MiSeq reads in validation of the variants called by WGS or Exome Sequencing? Personally, I wouldn’t have thought so. But the study of the Broad Institute published a few days ago clearly shows that they can.

Variants called within projects that aim at analysis of variants definitely need validation to determine the rate at which the mutations have been correctly called and to confirm the specific reported changes. Currently used techniques like Sequenom genotyping and Sanger sequencing provide essential drawbacks, such as the need for manual interpretation or low data throughput. For that reason, Carneiro and his colleagues studied the power of PacBio RS and MiSeq data as a validation tool and compared the results with each other.

They generated amplicons covering 98 variants called in the 1000 Genomes Project and sequenced the PCR products with both instruments, PacBio RS and MiSeq. Using PacBio RS data 96 out of the 98 variants could be correctly genotyped, whereas the MiSeq correctly genotyped only 93 sites. The explanation of the authors is quite simple: The completely random distribution of errors across the reads can overcome the low read accuracy problem if sufficient coverage is applied.

Manual checking of the sites, that were miscalled using the PacBio dataset, revealed, that one of the two miscalls happened due to a reference bias (true variation is hidden). Such bias is introduced by alignment parameters where the gap open penalty is higher than the base mismatch penalty. The high error rate of PacBio RS reads makes these parameters necessary.

However, Carneiro told GenomeWeb, that the researchers are not using a different aligner that was developed at the Broad Institute. This aligner re-aligns the reads using different parameters and therefore reduces the problem to a great extent.

For me the study shows that there is potential for PacBio RS sequencing. Nevertheless, like the variants, also this study result needs to be validated. Furthermore I think that the value of the study needs also to be seen in relation to the sequencing cost for both instruments. While the consumable prices for both techniques are in a similar range, the several fold higher cost for the PacBio RS instrument makes a remarkable difference.

Comparison of De Novo Assemblies of the Escherichia coli Outbreak

The E. coli EAHEC outbreak inGermany has been an opportunity to compare currently available sequencing technologies with respect to the data quality.

Regarding the N50 contig size and amount of contigs/scaffolds best assembly quality so far was achieved using the long read technologies in the market, Roche’s GS Junior sequencing and Pacific Biosciences’ PacBio RS sequencing. For further comparison I am therefore going to focus on data of both long read technologies. But first, please have a look at the sequencing layouts:

Sequencing layout PacBio RS (Source: Pacific Biosciences >):

First library: Standard sequencing library (200-fold coverage)
Second library: Circular consensus sequencing library (35-fold coverage)
Sequencing: 56 SMRT cells

Sequencing layout Roche GS Junior
(Source: UK Health Protection Agency HPA >):

First library: Shotgun library
Second library: Long paired end library (LPE, 8 kbp insert length)
Sequencing: Three Roche GS Junior runs (25-fold coverage)

Comparison of the results

The de novo assembly is comprised of 33 contigs with PacBio RS sequencing and 13 scaffolds with Roche GS Junior. The N50 contig size of the PacBio sequencing approach is 402 kbp and the N50 scaffold size of the Roche 454 sequencing approach is 968 kbp. Both de novo assemblies > with Illumina MiSeq and Ion Torrent PGM data so far revealed higher amount of contigs and considerably shorter N50 contig sizes (95 kbp and 50 kbp, respectively).

This data once again shows that de novo sequencing strictly needs long reads. The advantageous effect of the very long reads >  of PacBio for scaffolding (on average 2900 bp and 5% longer than 5100 bp) is balanced in the other approach by sequencing of the LPE library.

Important to mention is that the PacBio assembly is generated with reads from a not yet released chemistry (planned for quarter 4). In contrast Roche 454 assembly did not contain the long FLX+ chemistry reads that will become available for GS FLX by the end of the month. According to our experience a read length of 650 – 750 bp will have some additional positive effect on number of scaffold and N50 scaffold size.

Most striking for me is that as much as 56 flow cells were needed to generate the PacBio assembly with the high consensus accuracy > of 99.998 %. The standard library was sequenced with that high coverage in order to increase the number of very long reads and the circular consensus sequencing library was employed for further correction of errors derived of still low single read accuracy.

The New Biology – Video Serial by Pacific Biosciences

Real infotainment by Pacific Biosciences talking about future biology: The serial starts with a smart introduction about cutting-edge technologies that provide the opportunity to create predictive models of living systems, and gain wisdom about the fundamental nature of life itself.