Tag Archives: Exome

Exome Sequencing At A Glance

Selective characterisation of the genome’s complete coding region

In humans, only 1-2 % of the genome is protein coding, the so-called exome. Exome sequencing is favoured over whole genome sequencing due to costs, efficiency and the easier interpretability of a much lower data volume compared to whole genome sequencing. It gains more and more clinical relevance in the determination of rare diseases as well as for cancer research and diagnostics. Furthermore, it’s a very important screening tool for genetic variations e. g. involved in mental disorders such as schizophrenia and is therefore increasingly used as one genomic application in drug discovery. Exome analyses are frequently conducted as trio analyses with one patient plus healthy parents, who serve as controls to filter out benign variants. They are not only performed on behalf of companies or academic research organisations, but also gain more importance in diagnostic applications for individuals.

The most common technologies for exome analysis are based on in-solution hybridisation. They use a protocol that first generates a whole genome library, and then enriches the exome portion of the genome. The well-established kits for this kind of analysis are from NimbleGen, Agilent and Illumina. The exome enriched DNA is then primarily sequenced with Next Generation Sequencing systems from Ilumina, like Illumina HiSeq. This approach is typically selected for projects with large sample numbers. One limitation is the incomplete coverage for some genetic loci. More consistent sequence coverage can be achieved by using a PCR based exome capture approach offered by Ion Torrent. This approach allows a very fast and a more uniform exome analysis ideal for small to mid-size sample numbers.


PacBio RS Data to Validate SNPs Called from Illumina Sequencing?

Would you have thought that PacBio RS sequences with about 15% single read error rate can outdo MiSeq reads in validation of the variants called by WGS or Exome Sequencing? Personally, I wouldn’t have thought so. But the study of the Broad Institute published a few days ago clearly shows that they can.

Variants called within projects that aim at analysis of variants definitely need validation to determine the rate at which the mutations have been correctly called and to confirm the specific reported changes. Currently used techniques like Sequenom genotyping and Sanger sequencing provide essential drawbacks, such as the need for manual interpretation or low data throughput. For that reason, Carneiro and his colleagues studied the power of PacBio RS and MiSeq data as a validation tool and compared the results with each other.

They generated amplicons covering 98 variants called in the 1000 Genomes Project and sequenced the PCR products with both instruments, PacBio RS and MiSeq. Using PacBio RS data 96 out of the 98 variants could be correctly genotyped, whereas the MiSeq correctly genotyped only 93 sites. The explanation of the authors is quite simple: The completely random distribution of errors across the reads can overcome the low read accuracy problem if sufficient coverage is applied.

Manual checking of the sites, that were miscalled using the PacBio dataset, revealed, that one of the two miscalls happened due to a reference bias (true variation is hidden). Such bias is introduced by alignment parameters where the gap open penalty is higher than the base mismatch penalty. The high error rate of PacBio RS reads makes these parameters necessary.

However, Carneiro told GenomeWeb, that the researchers are not using a different aligner that was developed at the Broad Institute. This aligner re-aligns the reads using different parameters and therefore reduces the problem to a great extent.

For me the study shows that there is potential for PacBio RS sequencing. Nevertheless, like the variants, also this study result needs to be validated. Furthermore I think that the value of the study needs also to be seen in relation to the sequencing cost for both instruments. While the consumable prices for both techniques are in a similar range, the several fold higher cost for the PacBio RS instrument makes a remarkable difference.

Comparison of Exome Enrichment Technologies in Nature Biotechnology

Very recently researchers from Stanford University systematically investigated performance of the most widely used exome enrichment platforms:

  1. Roche/NimbleGen’s Seq Cap EZ Exome Library v2.0 (44 Mbp)
  2. Agilent’s Sure Select Human All Exon (50 Mbp)
  3. Illumina’s Tru Seq Exome (61 Mbp)

One of the findings of the study is: When comparing coverage efficiency at constant read depth (80 million reads each) NimbleGen Sequence capture is by far better than the other two platforms. With NimbleGen sequence capture 98.6 % of all targeted bases were covered at least 10x, while Agilent’s Sure Select and Illumina’s Tru Seq covered only 89.6 % and 90.0 % of all bases at least 10x. In my opinion, the different target sizes of the exomes should have been taken into account. In this case the read depth should have been normalized according to the exome sizes. Independent of the missing normalisation it is however clearly shown in the paper that the NimbleGen technology enriched a much higher percentage of the targeted bases than the other two products..

Other criteria that were compared are the off-target enrichment rate (NimbleGen performed best) as well as the enrichment bias owing to GC content (Agilent performed best).

The decision, which platform is best for a specific scientific question should also be influenced by the individual target regions covered by different Exome kits. Agilent’s and NimbleGen’s exomes share 38 Mbp of their target regions. Apart from that Agilent’s Exome covers better Ensembl genes, while NimbleGen’s Exome covers a greater portion of miRNAs. Illumina’s exome, although displaying low coverage efficiency, is designed to capture UTRs in addition, which by now are almost not covered by the other designs and is therefore the choice, if those regions are of interest.

Differences in the performance come from the different oligonucleotide designs. I therefore postulate similar key parameters when using the customised versions of the capture technologies.