Think Big: The UK 100,000 Genome Project

In late 2012 the 100,000 genome project was launched. UK Prime Minister David Cameron announced a new initiative led by the National Health Service to sequence the genomes of up to 100,000 people and to use their genomic information in treatment and studies of cancer and other diseases. The government set aside 100 million GBP for this project.

hiseq-x-tenGenomics England which is heading the project now named 10 firms that have been selected to for the assessment of the next phase of the project. The companies are Congenica; Diploid; NantOmics; Genomics Ltd.; Illumina; Qiagen; Lockheed Martin; NextCode Health; Omicia; and Personalis.

As part of the recently completed stage, Genomics England in February sent out a questionnaire to 28 participants in relation to 10 cancer/normal samples and 15 rare disease trio samples.

Illumina is partnering as well and will contribute with the ultra-high throughput sequencing platform HiSeq XTM Ten.

What will be the next step? Sequencing everyone?

FacebookTwitterGoogle+Share

More Updates: Illumina & IonTorrent

Quarter 4 of 2014 seems to be another exciting one for Next Generation Sequencing. Beside the chemistry update for PacBio RSII also Illumina and IonTorrent / ThermoFisher announced two major improvements / achievements:

  • Chemistry update for the Illumina HiSeq X Ten and the HiSeq 2500 Rapid Run
    The new v2 reagent kit for the HiSeq X Ten supports a PCR-free sample preparation kit, which eliminates amplification during the library preparation. So far only sample preparation kits with PCR were possible, which sometimes results in a lower quality of challeging genomic regions.
    The new v2 reagent kit for the HiSeq 2500 enables users to sequence 2x 250 bp and the new chemistry therefore delivers up to 300 Gbp of data in only 60 hours. (Press Release)
    To my opinion Illumina proves once more that NGS is highly dynamic and that their continous update for existing systems is the key for their success (the latest financial report confirms that Q3 of 2014 with a growth of 10% is the strongest since 2011 for Illumina (Fierce Medical Devices)).
  • IonTorrent goes diagnostic
    The Ion PGM Dx System is now also CE-Marked for in vitro diagnostic (IVD) use in Europe. Thermo Fisher Scientific believes that the CE-mark “will enable European clinical laboratories to more easily [...] implement new [...] diagnostic assays” (Press Release).
    In September they announced already that the PGM is now listed with the U.S. FDA as a Class II Medical Device.
    To my opinion the clearance for diagnostic use in Europe as well as in the U.S. will further strengthen the position of the Ion PGM in clinical laboratories.

PacBio launches new chemistry and software

In a press release Pacific Biosciences announced the latest enhancement for the PacBio RS II single molecule DNA sequencer. The latest release of the polymerase 6 and chemistry 4 (P6 – C4) version in combination with improved software enhances the performance and output of the platform by 45%. The average read length is now 10,000 – 15,000 bases and up to 40,000 bases for the longest reads. Depending on the nature of the DNA a single SMRT cell will deliver 500 million to 1 billion bases.

The new chemistry will replace the current P5 – C3 chemistry and is recommended for all SMRT sequencing applications.

This new release also includes improvements to the SMRT Analysis software suite for long amplicon analysis and the Iso-Seq™ method. Together with chemistry enhancements, these advances boost accuracy, speed up analysis, and support sequencing of multiplexed amplicons of different sizes.

Do you want to share your biggest secret?

m4s0n501

people_09Should we all get our genome sequenced? And share the information? Just today I read two articles in GenomeWeb regarding human genome sequencing. With, to my opinion, opposite views regarding sharing information from human genomes.

The first article is about the 23andMe project: Here two different groups of people said, that with the functionality “check for close relatives” box they ended up in real crisis in their family. In one case the parents divorced since the close relative box showed that the husband had already a child with another women (prior this marriage). And in the other case a girl found out that she has a brother, whom her mother has giving up for adoption.

So for me this is a clear indicator that simply sharing the genome information might really cause more problems than it can solve.

Exactly the opposite is asked for by George Church. From his point of view for eradicating diseases, creating unlimited energy sources and so on a public access to as many genomes (human and non-human) as possible is a prerequisite.

And I think I could agree to that partially, if we talk about bacteria or plant genomes. But I think we are not ready for a wide sharing of human genome information.

What also became clear to me is that we are not a lot further, than 2 years ago (Genomics – A Curse Or A Blessing?).

Genome sequencing identified Jack the Ripper

It is very likely, that the murders from Jack the Ripper are by far the best-known crime series in the world. The London police had six key suspects for the murders and one of them now could be identified as the killer (MailOnline).

The piece of evidence that was used to identify the murderer was a shawl found be one of the victims, that contained DNA from the victim as well as from the suspect. Using a whole genome sequencing approach, Dr. Louhelainen and his group extracted the 126-year-old DNA and compared it with descendants of the suspect. Read the complete article at DailyMail Online.

Are you ready to have your genome sequenced?

Genome sequencingLast month we asked if you would be interested in sequencing your genome. If the costs would be lower, the majority said “YES”.

More than 20% answered that their genome has already been sequenced. Personally, I would be very interested to know what they did with the data output.

 

If you are one of the guys who voted “I already have” please submit a comment why you decided to have your genome sequenced.

Whose Genome Has Been Sequenced? Belgica antarctica

de-novo-sequencingExtreme conditions require extreme actions. And this is what the midge Belgica antarctica has done. The midge lives exclusively in the Antarctic and in order to survive shrinked its genome to the smallest possible size. As of today, this is the smallest insect genome that has been sequenced.

Kelley et. al. now sequenced the genome of Belgica antarctica with the aim to learn more about how insects in general can adapt to the most extreme conditions.

What was sequenced?

Two fourth instar larva (Belgica antarctica) collected near Palmer Station, Antarctica.

Sequencing strategy: Whole genome sequencing & RNA-sequencing

  1. Libraries & Sequencing: 1 channel 2x 100 bp Illumina HiSeq 2000 (SG library (400 bp insert)) and one SMRT-cell of a 10 kb fragment library on PacBio RSII (P4 DNA Polymerase)
  2. Data output: 92 M paired-end reads from the shotgun sequencing with Illumina. These resulted in 5,422 contigs. Using the paired-end RNA-Seq data the number of contigs has been reduced to 5,064. Genome coverage with Illumina sequencing ~ 100x.
  3. Results: The total genome is ~ 99 Mbp.

For the PacBio sequencing a second larvae was used. But due to the low input of genomic DNA the PacBio data yielded only in a modest improvement in assembly. This underlines the need of a long-read sequencing technology with low input DNA material.

The de novo sequencing of the midge Belgica antarctica revealed that the smalll genome size is achieved by a reduction in repeats, TEs and intron size.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

A major upgrade of SAMTools: CRAM format to reduce NGS data load

SAMTools, one of the most popular NGS sequence analysis tools has recently been upgraded by Computer scientists at the Wellcome Trust Sanger Institute. SAMTools is a set of utilities which allow the manipulation of alignments in the SAM/BAM format. SAM is the acronym for Sequence Alignment/Map format, whereas BAM is just the binary form of SAM. SAM can be seen as the worldwide standard for storing large nucleotide sequence alignments.

SAMTools 1.0, the revised version of the free program suite now allows researchers an improved handling of their sequencing data. Further to the existing SAM and BAM file formats, SAMTools now supports the new CRAM format. Basically, CRAM files are alignment files, just like BAM files – except that their size is reduced by 10 -30%. For better handling even greater compression – up to 100-fold – can be achieved in the “lossy” mode, that still preserves the most important information. The savings in storage that CRAM offers could be achieved by incorporating data compression techniques which were cooperatively developed by the Sanger Institute and the EMBL-European Bioinformatics Institute.

“This major rebuild of SAMTools reflects our commitment to supporting the global use of sequencing data,” says Dr Richard Durbin, Head of Computational Genomics at the Sanger Institute. “Genome science worldwide relies on fast and efficient data analysis and storage, and SAMTools 1.0 fulfills this need by supporting new sequencing and analysis technologies”. Dr. John Marshall from the Sanger Institute is highly optimistic that the widespread uptake of the new format will lead to lower data storage costs on a global scale (complete article).

I am curious on how the new format is going to be adapted by the genomic community. By the way did you know that SAMTools has been downloaded more than 225,000 times?

Quality Before Quantity?

The  “$1.000 genome” is to my knowledge the buzzword everyone knows when thinking about Next Generation Sequencing.

And quite often I ask myself: What will be the future of NGS? Whole genome sequencing of everyone and everything?

I am confident that this is part of Illumina’s  strategy for their new HiSeq X Ten instruments – at least for humans.

Contrary there is still all the data that needs to be analysed. And an interview with Lex Nederbragt highlights that data analysis is still a bottleneck. Also the latest report from Markets&Markets for whole exome sequencing predicts a strong growth in targetd sequencing. They estimate a growth for whole exome sequencing  from $326.6 M in 2013 up to $884.1 Million by 2018.

So will quality, like sequencing distinct regions outcompete the $1,000 genome? What are your thoughts about that?

Large genome sequencing studies in the USA

senior-asian-woman-100226669The launch of the Illumina HiSeq X Ten enabled them to put in practice their plans and great visions. I’m speaking of two persons of great influence in the United States. This year, Dr. J. Craig Venter, known for being one of the first to sequence the human genome and Patrick Soon-Shiong, considered as the world’s richest doctor, both revealed some details about their large scale sequencing projects:

  • J. Craig Venter founded the company Human Longevity that aims to develop treatments for cancer and age-related conditions. To unveil the mechanisms how people can live long and healthy lifes, the company will become one of the largest DNA sequencing facilities in the world. The plan is to set up a sequencing center that is capable of sequencing 40,000 human genomes a year.
  • Patrick Soon-Shiong recently announced that his company NantHealth is purchasing sequencers being able to sequence 22,000 genomes annually. The samples will derive from the 22,000 patients diagnosed with cancer annually at the 34 hospitals owned by Providence Health & Services. Consequently, the company from Soon-Shiong probably will become the first one using their sequencing capacity for clinical sequencing on a large scale.

Such huge collections of sequencing data make it possible to uncover the molecular causes of an complex process as aging or such a diverse and complex disease as cancer in a general approach. Big and very valuable databases will be created, that may contribute to develop new pharmaceuticals or develop personalized therapies.