Archive | August, 2012

Comparison of NGS technologies – just a waste of time?

As already mentioned in our latest blog post Michael Quail and his team from the Sanger Institute published a comparison of the Ion Torrent PGM, the PacBio RS system and the Illumina MiSeq (BMC Genomics). This study and all the others performed recently couldn’t determine one clear winner as each system has its own advantages.

But really interesting are now the statements of the spokespersons from the different companies in a recent article from Julia Karow in GenomeWeb. They all agree on the same thing: the data collected in the publication have been true in 2011, but are outdated by now since a lot of effort is put into innovation. Every instrument performs a lot better now. So what is our conclusion? That comparisons for NGS technologies are just a waste of time? For the Sanger institute it means that they invested in 3 new MiSeq’s since the Illumina pipeline is already available. For me, these comparisons are also valuable for all other institutes. Although maybe outdated, they highlight the strength and weaknesses of each technology and help to decide where to invest thousands of dollars. What do you think?

PacBio RS Data to Validate SNPs Called from Illumina Sequencing?

Would you have thought that PacBio RS sequences with about 15% single read error rate can outdo MiSeq reads in validation of the variants called by WGS or Exome Sequencing? Personally, I wouldn’t have thought so. But the study of the Broad Institute published a few days ago clearly shows that they can.

Variants called within projects that aim at analysis of variants definitely need validation to determine the rate at which the mutations have been correctly called and to confirm the specific reported changes. Currently used techniques like Sequenom genotyping and Sanger sequencing provide essential drawbacks, such as the need for manual interpretation or low data throughput. For that reason, Carneiro and his colleagues studied the power of PacBio RS and MiSeq data as a validation tool and compared the results with each other.

They generated amplicons covering 98 variants called in the 1000 Genomes Project and sequenced the PCR products with both instruments, PacBio RS and MiSeq. Using PacBio RS data 96 out of the 98 variants could be correctly genotyped, whereas the MiSeq correctly genotyped only 93 sites. The explanation of the authors is quite simple: The completely random distribution of errors across the reads can overcome the low read accuracy problem if sufficient coverage is applied.

Manual checking of the sites, that were miscalled using the PacBio dataset, revealed, that one of the two miscalls happened due to a reference bias (true variation is hidden). Such bias is introduced by alignment parameters where the gap open penalty is higher than the base mismatch penalty. The high error rate of PacBio RS reads makes these parameters necessary.

However, Carneiro told GenomeWeb, that the researchers are not using a different aligner that was developed at the Broad Institute. This aligner re-aligns the reads using different parameters and therefore reduces the problem to a great extent.

For me the study shows that there is potential for PacBio RS sequencing. Nevertheless, like the variants, also this study result needs to be validated. Furthermore I think that the value of the study needs also to be seen in relation to the sequencing cost for both instruments. While the consumable prices for both techniques are in a similar range, the several fold higher cost for the PacBio RS instrument makes a remarkable difference.

We are not alone !

A recent article in Scientific American by Jennifer Ackerman entitled “The Ultimate Social Network”, highlights a particular problem when trying to sequence the genomes of eukaryotic organisms. The problem is that the organism in question, whether it is an ant, butterfly, a polar bear, frog or Blue whale is not a singular organism at all.

In fact the organism in question plays host to many millions of other prokaryotic organisms, mainly bacteria, viruses, fungi or parasites. In humans for example the genetic material from the microbiome outnumbers the human genome by at least 10 to 1. This is also expected to be true of all other eukaryotic species which harbour and maintain a symbiotic relationship with their microbiome.

The genes from the microbiome help process beneficial compounds and act to temper host immune defences for example. Therefore, when taking and extracting DNA from a eukaryotic organism it has to be considered what other genomes you may be preparing and sequencing alongside the desired genome of interest. For example it cannot be simply a case of freeze drying an insect crushing into a powder then extracting the DNA, as the resultant samples will contain a highly mixed and diverse set of genomes,  whereby the genome of interest may be present only in the lowest possible ratio. So, be warned! When assembling genomes be sure you know what your starting material actually contains.

Sequencing Performance versus Marketing Performance

Recently, a number of groups have attempted to compare the two platforms PGM and MiSeq, including the Sanger Institute a group from the University of Birmingham, and BGI. None of these studies have conclusively named a winner, and each group comes to slightly different conclusions.

In a blog of Genome Web’s “The Daily Scan” the different findings in the three comparison studies are discussed heavily. On the one hand different chemistries or older versions are compared with newer ones, on the other hand different application require different technologies.

According to a report by Jon Groberg at Macquarie Equities Research, Groberg cites several factors leading to Life Tech’s better selling success of the PGM over Illumina’s MiSeq (1300 vs. 700 systems sold): price — the PGM sells for $75,000, while the MiSeq goes for $125,000; Life has a more extensive commercial reach; the trajectory of improvement for the PGM is greater than for the MiSeq; and the PGM excels at certain key applications.

Of note are the differences in sequencing cost, based on list prices (see Sanger Institute study). The MiSeq came out cheapest, at $502 per gigabase, followed by the PGM, at $1,000 per gigabase using the Ion 318 chip, and the PacBio, at $2,000 per gigabase. All three platforms produce data at a greater cost than the Illumina GAIIx, at $148 per gigabase, and the HiSeq 2000, at $41 per gigabase.

What is your experience with the two systems?

The singing mouse

Next Generation Sequencing (NGS) is transforming today’s genomic research and is used in numerous applied areas from clinical diagnostics to academic research. In Texas USA, Dr. Steven Phelps and his research team recently used NGS sequencing to discover a gene which allows mice to communicate by singing a song. I have to admit it sounds more like screaming than singing to me. But Phelps and his team found out that a gene called FOXP2 is responsible for this way of communication.

Phelps’ uses next-generation sequencing to decipher how FOXP2 interacts with DNA to regulate the function of other genes. The process involves reading tiny fragments of overlapping DNA so that the entire sequence can be deduced. It is a procedure that generates massive amount of data that only the processing power of a supercomputer can handle, said O’Connell (Source: www.tacc.utexas.edu). So data handling & storage is still one of the biggest challenges when performing Next Generation Sequencing projects. But now take the chance an listen to the song of this little mouse.

Sequencing than soaking in Hot Spring

There are many volcanoes and earthquakes in Japan, but it is not always a bad thing, they are also responsible for the many hot springs. Most Japanese people love soaking in a hot spring and they believe that this eliminates fatigue and improves health. Hot springs also had a great contribution to biotechnology via the heat resistant DNA polymerase from Thermus aquaticus (Taq) and its derivatives. Not only PCR, but also Sanger sequencing was accelerated by these heat resistant enzymes as we all know well.

Scientists have started to study the genome/transcriptome world in hot springs with NGS technologies. Murakami et al., peformed 16S-rRNA (Sanger sequencing) and meta-transcriptome analysis from small RNA (GS FLX sequencing) of groundwater (up to 1,000 m depth) from Yunohara hot spring, Japan. Their phylogenetic analysis using 16S rRNA showed the classification of 17 species including archaea and eubacteria.  There are only 2 or 3 dominant species in typical cases of other hot springs, but this one is rich in diversity. Furthermore, they found the very unique group “Archaeal Richmond Mine Acidophilic Nanoorganisms (ARMAN)” which is a small organism/cell with only 200 nm size! Their small RNA analysis identified 64,194 (20,057 nonredundant) cDNA sequences, and they found several novel non coding RNAs which have a very stable secondary structure.

Therefore, hot springs may still be gold mines for useful genes and important biological knowledge of unknown underground ecosystems.

 

 

What is Optical Mapping?

Whole Genome Mapping (WGM) using the OpGen Argus technology delivers high resolution, ordered whole genome restriction maps from single DNA molecules. To receive such a restriction map it is crucial to isolate long DNA fragments (200 kb in size and longer) and to capture the DNA on a solid phase. Afterwards the DNA is digested revealing restriction cleavage sites as gaps when using a fluorescence microscope to visualize the DNA. This optical map will then be converted into digital data, the so called “single molecule restriction maps” (see video below). The software MapSolver enables the following analysis options (see details in the analysis video):

  • Perform Genome Comparisons
  • Identify Motifs, Annotate Features, and view in silico sequence data
  • Perform Sequence Placement
  • Create Similarity Clusters

 Video about step 3: How to scan and assemble single molecule restriction maps (SMRM)

Recently we gained access to this innovative technology and are able to combine our Next Generation Sequencing Service with the WGM technology. The combination of NGS and WGM can be used to order the contigs from a next generation sequencing project against the optical map scaffold. This method is able to highly improve sequencing assemblies. If you are interested in a combined or stand-alone project for WGM, please do not hesitate to contact us.

We look forward to discuss WGM in detail with you.