Archive by Author

Whole genome sequencing a complete island

Two days ago a groundbreaking study was published in Nature Genetics: Whole genome sequencing of 2,636 Icelanders and Genotyping of 104,220 Icelanders.

The advantage of using a small population like the Icelanders for this kind of study is that there are fewer rare variants, but sometimes also a higher occurance of some of these variants.

For the study, geenomic DNA was isolated from white blood cells and subsequent sequencing was performed on GAIIx and HiSeq instruments. The resulting reads were aligned to the human reference genome (NCBI Build 36 (hg18).

Gudbjartsson et al. then examined the data from different angles. For example, they looked for geographical dependencies for specific variants or how the data can be used to learn more about phenotypes and their underlying genomic pattern. But they also report an example “how rare variants […] can be used to analyze clinical problems”. (Gudbjartsson et. al)

Since every human being has a unique genomic pattern I think studies like this are of high importance to learn more about disease related genotypes. This will help to gain confidence in the results that we get from molecular diagnostic assays for disease treatment now and in the future.

Read the complete publication here.

How to handle variants in a reference genome

When talking about genome sequencing the human genome project is one of the best known projects. “Building” a reference genome that helps to identify disease-causing mutations is only one of many goals for the human reference genome.

But I am sure that all of you already asked the question: how can a reference genome even exists? On earth we have more than 7 billion people and among that many different characteristics. So how can one human reference serve for all mankind?

The Global Alliance, lead by David Haussler, recently won a $1 million grant to create a graphical model of the human genome (BioTechniques). The graph model should help to visualise variants as alternate pathways. Like that a more comprehensive picture of “naturally occuring variants” and disease causing variants might be gained. To support this approach, they got access to 300 complete human genome sequences from the Broad Institute in Cambridge.

From my point of view this is a great idea and I hope it helps to further pave the way how the massive amounts of sequencing data can be handled and interpreted in the near future!

Read the complete article at


PacBio Forecast 2015

ID-10081802As already predicted, it is not only Illumina who communicates innovations for their NGS portfolio. Here you can read about the implementations Pacific Biosciences plans this this. I think the good news for many users of PacBio machines is, that they do not talk about new instruments, but improvments that affect already installed machines (GenomeWeb):

  • PacBio plans to improve the sequencing chemistry, including the active loading of single polymerase enzymes onto the chip
  • PacBio plans to improve the workflows for an easier and faster handling of samples
  • PacBio plans to improve bioinformatics for faster de novo genome assemblies & better analysis of full-length HLA analysis

With this changes PacBio wants to extend the data output to more than 4 gigabases / SMRT cell and increase the average read lengths to 15-20 kbp.

Read more about it here.

I still wonder if there will be news from PacBio this year about a new system? Maybe a benchtop like everyone has?

I will keep you updated!

New Illumina Instruments

HiSeq_picsNew Year – New Innovations. Illumina directly starts off 2015 with a huge announcement: the launch of 4 new systems (GenomeWeb, 12th Jan).

Here a short overview of the new systems:

  • HiSeq X Five – scaled down version of the X Ten; costs: $6 million
  • HiSeq 3000  – uses a single flow cell and offers a lower price per data point than the HiSeq 2500; half the throughput (750G) as the HiSeq 4000; costs: $740,000
  • HiSeq 4000 – uses a dual flow cell and can sequence up to 12 genomes or 180 exomes in 3,5 days or less; costs: $900,000
  • NextSeq 550 – combines microarray scanning with NGS; applications: cytogenetics & prenatal genetic diagnostsis; costs: $275,000

Now I am curious to see if also other providers will have such surprising news as Illumina. We will keep you posted…

Transcriptome assemblers put to the test

Next Generation Sequencing produces millions and billions of reads – and the interpretation of this reads rely on bioinformatic tools.

Especially for de novo assemblies of genomes or transcriptomes the result can vary dependent on the quality of the assembly.

In a recent publication Shorash Amin and his co-workers sequence the transcriptome of the non-model gastropod Nerita melanotragus with the Ion PGM. Afterwards they used different softwares and compared the quality different assemblies of the transcriptome (Amin et. al).

Oases, Trinity, Velvet and Geneious Pro, were the four de novo transcriptome assemblers that were used for this study. The assemblers were compared on different parameters like the length of the contigs, N50 statistics, BLAST and annotation success.

The longest contig was created with the Oasis assembler (1700 bp) and overall Trinity and Oasis delivered much better results than the de novo assembly of Ion PGM reads with Velvet or Geneious Pro.

Furthermore the mapping to a reference genome showed that Ion PGM transcriptome sequencing and subsequent de novo assembly with either Trinity or Oasis generates reliable and accurate results.

Read the complete publication here.

Prepare NGS for clinical use

Molecular diagnostics (MDx) is to my opinion the most sensitive application for all kinds of molecular biology techniques like PCR, Sanger Sequencing or Next Generation Sequencing. Today, NGS is still a niche application and needs further improvement to be a common tool for MDx. One thing that is lacking is the standardisation of NGS for clinical use.

The NGS Working Group, established by the Friends of Cancer Research worked out a master plan (The ASCO Post), with critical points that need to be addressed to use NGS more commonly:

1. Define a regulatory pathway for cancer panels (a selection of multimarker gene assays) intended to identify actionable oncogenic alterations (those with supporting data to create risk-benefit assessment of treatment choice) that allow flexibility in the appropriate FDA medical device pathway—for instance, one based on risk classification of different panel components depending on the specific marker.

2. Approaches to validation studies should be based on the types of alterations measured by the assay rather than on every alteration individually.

3. Determine the contents of a cancer panel by classifying potential markers based on current utility in clinical care and clinical trials and peer-reviewed publications, as well as recognized clinical guidelines. Draw upon various sources to determine the recommended marker set for an actionable cancer panel.

4. Promote standardization of cancer panels through development and use of a common set of samples to ensure reproducibility on each platform.

5. Establish a framework for determining an appropriate reference method rather than relying on any single method for all studies.

Get more information to each proposal here.

Whose genome has been sequenced? Brassica napus

de-novo-sequencingBrassicas napus, also known as oilseed rape, was formed more than 7000 years ago by allopolyploidy (chromosome doubling from to Brassicas species). Of course the genome mutated further and so it is known today that during this evolution some genes were preserved and further “improved” (e.g. oil biosynthesis genes), whereas others were lost over the course of time (e.g. glusoinolate genes).

Chalhoub et. al now sequenced the genome, because it can help to “provide insights into allopolyploid evolution and its relationship with crop domestication and improvement” (Chalhoub et. al).

What was sequenced?

Young fresh leaves from the Brassica napus French homzygous winter line “Darmor-bzh“.

Sequencing strategy: Whole genome sequencing

  1. Libraries & Sequencing:
    Roche GS FLX: ~ 70 Million reads, Average Read length: ~ 368 bp, Genome coverage: 21.2 %
    Sanger BAC Seq: 141k reads, Read length: 650 bp; Genome coverage: 0.1%
    Illumina HiSeq:  ~375 Million reads, Read length: 36, 76, 108 and 150 bp, Genome coverage: 53.9%
  2. Data output: 44.146 contigs and 20.702 scaffolds
  3. Results: A final assembly of 849.7 Mb (using SOAP and Newbler) with 89% nongapped sequences.

After genome assembly the genome was mapped to other species (e.g. B. rapa and B. oleracea) and this helped to find several interesting genes and gene variation that help to understand the complete evolution better.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

More Updates: Illumina & IonTorrent

Quarter 4 of 2014 seems to be another exciting one for Next Generation Sequencing. Beside the chemistry update for PacBio RSII also Illumina and IonTorrent / ThermoFisher announced two major improvements / achievements:

  • Chemistry update for the Illumina HiSeq X Ten and the HiSeq 2500 Rapid Run
    The new v2 reagent kit for the HiSeq X Ten supports a PCR-free sample preparation kit, which eliminates amplification during the library preparation. So far only sample preparation kits with PCR were possible, which sometimes results in a lower quality of challeging genomic regions.
    The new v2 reagent kit for the HiSeq 2500 enables users to sequence 2x 250 bp and the new chemistry therefore delivers up to 300 Gbp of data in only 60 hours. (Press Release)
    To my opinion Illumina proves once more that NGS is highly dynamic and that their continous update for existing systems is the key for their success (the latest financial report confirms that Q3 of 2014 with a growth of 10% is the strongest since 2011 for Illumina (Fierce Medical Devices)).
  • IonTorrent goes diagnostic
    The Ion PGM Dx System is now also CE-Marked for in vitro diagnostic (IVD) use in Europe. Thermo Fisher Scientific believes that the CE-mark “will enable European clinical laboratories to more easily […] implement new […] diagnostic assays” (Press Release).
    In September they announced already that the PGM is now listed with the U.S. FDA as a Class II Medical Device.
    To my opinion the clearance for diagnostic use in Europe as well as in the U.S. will further strengthen the position of the Ion PGM in clinical laboratories.

Do you want to share your biggest secret?

people_09Should we all get our genome sequenced? And share the information? Just today I read two articles in GenomeWeb regarding human genome sequencing. With, to my opinion, opposite views regarding sharing information from human genomes.

The first article is about the 23andMe project: Here two different groups of people said, that with the functionality “check for close relatives” box they ended up in real crisis in their family. In one case the parents divorced since the close relative box showed that the husband had already a child with another women (prior this marriage). And in the other case a girl found out that she has a brother, whom her mother has giving up for adoption.

So for me this is a clear indicator that simply sharing the genome information might really cause more problems than it can solve.

Exactly the opposite is asked for by George Church. From his point of view for eradicating diseases, creating unlimited energy sources and so on a public access to as many genomes (human and non-human) as possible is a prerequisite.

And I think I could agree to that partially, if we talk about bacteria or plant genomes. But I think we are not ready for a wide sharing of human genome information.

What also became clear to me is that we are not a lot further, than 2 years ago (Genomics – A Curse Or A Blessing?).

Genome sequencing identified Jack the Ripper

It is very likely, that the murders from Jack the Ripper are by far the best-known crime series in the world. The London police had six key suspects for the murders and one of them now could be identified as the killer (MailOnline).

The piece of evidence that was used to identify the murderer was a shawl found be one of the victims, that contained DNA from the victim as well as from the suspect. Using a whole genome sequencing approach, Dr. Louhelainen and his group extracted the 126-year-old DNA and compared it with descendants of the suspect. Read the complete article at DailyMail Online.