Update on NGS and Clinical Validation

Clinical validationThere is an increasing demand for the development of regulated next-generation sequencing based diagnostic tests. The review that I would like to draw your attention to is thoroughly discussing all challenges and issues that arise when developing NGS-based diagnostic tests or even CDx. The experts form the Merck Research Laboratories take very thing into account starting from the choice of the platform, bioinformatics through to the regulatory approval process.

Have a read, it’s really worth it!

http://journal.frontiersin.org/Journal/10.3389/fonc.2014.00078/full

FacebookTwitterGoogle+Share

Data analysis – still a bottleneck!

With the many NGS machines around in the field, we daily produce tremendous amounts of sequencing data. However, at the end of the day, all the data have to be analyzed and interpreted. In many cases, this step is still a bottleneck.

Please check the video below which is an interview with Lex Nederbragt, Bioinformatician at the Norwegian High-Throughput Sequencing Centre in Oslo, on this topic. He discusses the fact that the analysis tools which are available do not fully fulfill the needs of the researchers. In this context, he also discusses the use of open source and commercial software tools.

Lex Nederbragt discussing software bottlenecks and lack of flexible reference genomes from NGS Perspectives on Vimeo.

100,000, 40,000, 25,000, 19,000 – the shrinking human genome…

DNAFor sure many of you remember old textbooks, in which the total number of genes in the human genome was estimated around 40,000 to 100,000. After the human genome was sequenced this number shrunk to 26,000 – 40,000 genes. The 19th GENCODE release further reduced this number to 20,318 protein-coding genes. But not enough a recent study suggested that the actual number of protein-coding genes in humans lies around 19,000.

This astonishing result could be obtained by analyzing the data derived from seven large MS-based proteomics studies from more than 50 human tissues.

But the shrinking number of genes is not the only remarkable results – find below the most important results from this study as described in a recent ScienceDaily blog post:

  • Close to 12 000 human genes could be unambiguously identified
  • Despite high coverage from seven analyses, 40% of the peptides from the human gene set could not be detected; Possible reasons:
    • Thousands of genes annotated in the human genome did not appear in the proteomics analysis.
    • Apparently 1,700 genes that were previously thought to produce proteins most certainly don’t
  • Another hypothesis is that more than 90% of human genes produce proteins originating in metazoans or multicellular organisms living hundreds of millions of years ago
  • The difference between humans and primates at the gene and protein level is very small
  • “The number of new genes that separate humans from mice may even be fewer than 10”
  • Physiological and developmental differences between primates are more likely caused by gene regulation than by differences in the basic functions of proteins in question

Alfonso Valencia, the main researcher behind this project states that “the human genome is best annotated, but we still believe that 1,700 genes may have to be re-annotated”.

According to Alfonso Valencia these results may redefine the entire mapping of the human genome.

The Common Marmoset as a Model Organism for the Study of Drug Metabolism

marmosetSeveral non-human primates including Macaca mulatta and Macaca fascicularis are well known as experimental animals in the field of neuroscience, stem cell research, drug toxicology, and other applications. The common marmoset (Callithrix jacchus) is also a non-human primate and is suitable as experimental animal because of the small size and highfecundity.

For developing a drug metabolism model, our collaborators and Eurofins Genomics (2014) performed transcriptome analysis of the common marmoset using in parallel long-read technology (Roche GS FLX+) and short-read sequencing (Illumina HiSeq 2000). This parallel NGS approach resulted in both, the identification and the quantitative analysis of transcripts and thus giving insight into gene expression during drug metabolism. Finally we obtained rich information about genes involved in drug-metabolism including 18 cytochrome P450- and 4 flavin-containing monooxygenase -like (FMO) genes, and their tissue-specific expression patterns.

The results of this study are the foundation for future studies not limited to drug metabolism & pharmacokinetics.

First Oxford Nanopore MinIon data available: Is this the end of PacBio?

Nanopore SequencingResearchers from the University of Birmingham in the UK last week publicly released data they generated with Oxford Nanopore Technologies’ MinIon nanopore sequencer, the first group to do so since the company started its early access program this spring (see In Sequence report).

The sequence is derived from a Pseudomonas aeruginosa genome and is a single 8.5 kilobase read. It was posted by Nick Loman from the institute of Microbiology and Infection at the University of Birmingham. It was possible to identify the serotype O6. The sequence can be found here. It is of low quality with 71% identity of the spanned region.

Konrad Paszkiewicz, director of the Wellcome Trust Biomedical Informatics Hub and head of the sequencing service at Exeter, has been writing about the group’s experience on the Exeter Sequencing Service’s blog. “Even at this stage, this platform has the potential to steal large chunks out of the market from the likes of PacBio,” Paszkiewicz said.

We will have to wait for more data until we see how useful the technique will be and how the technique is able to compete against other Nanopore sequencers e.g. the device of Genia that was recently acquired by Roche.

Improvement of PacBio ZMW loading procedure by DNA Origami?

Since the launch of the PacBio system in 2011, there has been a constant development and improvement of the methods involved (e.g. former posts here).

OrigamiStar-BlackPen.pngHowever, efficient loading of the Zero-Mode Waveguides (ZMWs) with polymerase molecules still remains a challenge. The ZMWs are tiny wells in which the actual sequencing reactions take place. Each SMRT cell consists of 150,000 ZMWs. However, with current methods, only about 1/3 of the ZMWs is actually useable after loading. The polymerase molecules are loaded onto the ZMWs by simple diffusion – resulting in ZMWs which carry one, more than one, or no polymerase molecule. As a consequence, each SMRT cell typically generates only approx. 50,000 reads per run.

A group of researchers from the Technical University of Braunschweig, Germany, has now used “DNA Origami” in order to efficiently place molecules into ZMWs.

DNA origami is a fascinating technique which uses the unique properties of DNA in order to create nanostructures by “folding” DNA into the required shapes. A ground-breaking article on DNA origami has been written by Paul Rothemund in 2006.

The researchers from Braunschweig have now created “nanoadapters” which exactly fit the size of the ZMWs. As a consequence, there cannot be more than one molecule in a ZMW. The nanoadapters carry a fluorescent dye on top and biotin molecules on the bottom side. These biotin molecules serve in fixing the nanoadapters to the bottom of the ZMW via neutravidin. In principle, the fluorescent dye could be replaced by a polymerase molecule. This approach greatly increased the loading efficiency to approx. 60 percent.

However, according to InSequence, the research group did not co-operate with PacBio for this project. In parallel, PacBio is working on other methods to increase the loading efficiency of their SMRT cells. But I am sure that there will be (and has to be) an improvement soon- no matter by which methods.
OrigamiStar-BlackPen” by Aldaron, a.k.a. Aldaron. – From JillsArt, posted with permission. Licensed under Attribution via Wikimedia Commons.

Exome Sequencing At A Glance

Selective characterisation of the genome’s complete coding region

In humans, only 1-2 % of the genome is protein coding, the so-called exome. Exome sequencing is favoured over whole genome sequencing due to costs, efficiency and the easier interpretability of a much lower data volume compared to whole genome sequencing. It gains more and more clinical relevance in the determination of rare diseases as well as for cancer research and diagnostics. Furthermore, it’s a very important screening tool for genetic variations e. g. involved in mental disorders such as schizophrenia and is therefore increasingly used as one genomic application in drug discovery. Exome analyses are frequently conducted as trio analyses with one patient plus healthy parents, who serve as controls to filter out benign variants. They are not only performed on behalf of companies or academic research organisations, but also gain more importance in diagnostic applications for individuals.

The most common technologies for exome analysis are based on in-solution hybridisation. They use a protocol that first generates a whole genome library, and then enriches the exome portion of the genome. The well-established kits for this kind of analysis are from NimbleGen, Agilent and Illumina. The exome enriched DNA is then primarily sequenced with Next Generation Sequencing systems from Ilumina, like Illumina HiSeq. This approach is typically selected for projects with large sample numbers. One limitation is the incomplete coverage for some genetic loci. More consistent sequence coverage can be achieved by using a PCR based exome capture approach offered by Ion Torrent. This approach allows a very fast and a more uniform exome analysis ideal for small to mid-size sample numbers.

exome_sequencing

Work where others make holidays !

Prof. Leonid Moroz from the University of Florida has become the first scientist to sequence the genome of fragile marine creatures on board a ship in real-time (see scientific computing world).

copaseticBecause of the difficulties of storing or shipping their genetic material, it has hitherto been difficult to sequence the genomes of marine species. However, researchers at the University of Florida have got round this problem by deploying a fully-equipped genomic laboratory aboard a ship called the Copasetic and sending the initial data via a satellite link to the University’s new HiPerGator supercomputer.

Aboard the Copasetic in early February and later in March-April, Professor Leonid Moroz, from the University of Florida, and his team where able to perform transcriptome sequencing of 22 organisms, among them rare comb jellies.

The first results of the sequencing at sea were presented at the international conference, Advances in Genome Biology and Technology, held at Marco Island, Florida in February.

Next Generation Sequencing Market Trends

paper_02The GEN report by Enal Razvi, Ph.D. provides an overview of the current NGS field in terms of application areas and utilization patterns.

Some findings of the report:

    • The exponential growth of NGS-focused publications illustrates the expansion of NGS and its penetration info research.
    • 49% of next generation sequencing methods are used for basic research.
    • 29% of researchers are using NGS for comparative genome sequencing
    • 38% of research efforts are studying somatic mutation
      33% are studying mRNA expression via RNA-Seq

Is China breaking the dominance of Illumina?

BIGIS-4 is the name of an independently developed next generation sequencer made in China. The sequencer shall challenge the dominance of Illumina. On 18 April, scientists from the Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences and partner company Zixin Pharmaceutical Industrial Co Ltd demonstrated their BIGIS-4 sequencing machine in Changchun, Jilin province.

The Chinese machine has a longer read length than dominant sequencers like those made by Illumina in the US. Its manufacturing cost will be one third cheaper than imported machines, and operation costs about one fifth lower, according to Yu Jun of BIG, chief scientist of the project. Yu was also a co-founder of Shenzhen-based BGI, a spin-off of BIG and now the world’s largest sequencing service provider.

Yu’s sequencer differs from Illumina’s in that the fluorescent tag is cleaved from the newly synthesised DNA as it is incorporated, so that the reading speed is much quicker. This is similar to the pyrosequencing technology employed by Roche Diagnostics’ subsidiary 454 Life Sciences.

A publication about the complete genome sequencing and assembly of a Glaciecola mesophila spec. with BIGIS-4 is published here.