Tag Archives: HiSeq 2000

Whose Genome Has Been Sequenced? Belgica antarctica

de-novo-sequencingExtreme conditions require extreme actions. And this is what the midge Belgica antarctica has done. The midge lives exclusively in the Antarctic and in order to survive shrinked its genome to the smallest possible size. As of today, this is the smallest insect genome that has been sequenced.

Kelley et. al. now sequenced the genome of Belgica antarctica with the aim to learn more about how insects in general can adapt to the most extreme conditions.

What was sequenced?

Two fourth instar larva (Belgica antarctica) collected near Palmer Station, Antarctica.

Sequencing strategy: Whole genome sequencing & RNA-sequencing

  1. Libraries & Sequencing: 1 channel 2x 100 bp Illumina HiSeq 2000 (SG library (400 bp insert)) and one SMRT-cell of a 10 kb fragment library on PacBio RSII (P4 DNA Polymerase)
  2. Data output: 92 M paired-end reads from the shotgun sequencing with Illumina. These resulted in 5,422 contigs. Using the paired-end RNA-Seq data the number of contigs has been reduced to 5,064. Genome coverage with Illumina sequencing ~ 100x.
  3. Results: The total genome is ~ 99 Mbp.

For the PacBio sequencing a second larvae was used. But due to the low input of genomic DNA the PacBio data yielded only in a modest improvement in assembly. This underlines the need of a long-read sequencing technology with low input DNA material.

The de novo sequencing of the midge Belgica antarctica revealed that the smalll genome size is achieved by a reduction in repeats, TEs and intron size.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Whose genome has been sequenced? Aquila chrysaetos

de-novo-sequencingEvery day an unimaginable number of NGS data is generated. Anyhow the number of avian genomes that have been sequenced so far is still quite small (Doyle et al table 1). Doyle et. al added one more avian genome to this list – the “Golden Eagle” Aquila chrysaetos.

What was sequenced?

A male golden eagle (Aquila chrysaetos canadensis) captured in the southern Sierra Nevada.

Sequencing strategy: Whole genome sequencing

  1. Libraries & Sequencing: 1 channel 2x 100 bp SG paired-end sequencing and 1 channel 2x 100 bp mate-paired sequencing using the Illumina HiSeq platform
  2. Data output: 68.4 Gb of raw data (25.3 Gb from the SG and 43.1 Gb from the mate-pair library). Total genome size (incl mtDNA) ~ 1.28 Gbp. Overall genome coverage ~ 40x. Longest scaffold: 11,517,212 bp
  3. Results: The mtDNA genome is characterised by 13 protein-coding genes, 2 rRNAs and 23 tRNAs. The annotation produced a total of 16,571 predicted nuclear genes.

Besides the nuclear genome Doyle et al could also assemble the complete mitochondrial genome. Furthermore they found ~ 800,000 novel polymorphisms. These polymorphisms can now help to define markers that are involved in carnivory orother biological processes.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Why should I buy Illumina stock shares?

What is the impact for a company of winning an award? In case of Illumina you can cleary see it is about brand and market awareness. Only recently we reported about the award for Illumina of beeing the smartest company in 2013. Today we have a short interview for you that answers the question: why to buy stock shares for Illumina.

From my point of view the only risk of being a market leader in a highly dynamic area like next generation sequencing is, that you have a lot to loose. But Illumina is working on this. One example: only this year Illumina launched two new next generation sequencing instruments: the X-Ten for human whole genome sequencing and the Next500 – a mid-size sequencer that fills the gap between the HiSeq and the MiSeq. So let’s see what happens next…

Illumina – smarter than Google

Technology Review‘s analysed the markets Energy, Biotech, Computing & Communications, Internet & Digital Media, and Transportation in search of the smartest company in 2013. The main criteria is to look for the company with the biggest impact on the industry, mainly driven by innovation. They put together a list of the 50 smartest companies

… and the winner is:  Illumina

Important other companies, everyone knows are well behind… – maybe also because reputation has no influence on the ranking. Here some examples:

  • Google  #3
  • Dropbox #6
  • Amazon #10
  • Siemens #24
  • IBM # 35

By the way: in last year’s ranking Illumina was not even on the list. But Complete Genomics (#11), Life Technologies (#27) and Roche (#34).




Congratulations, Illumina!

Whose Genome Has Been Sequenced? Hevea brasiliensis

de-novo-sequencingAll of us have at least once been doing experiments in the lab. And so everyone was confronted with latex gloves. And more and more of us developed a kind of latex allergy.

According to Rahman et al. “these allergies are triggered by certain proteins present in Hevea-derived natural rubber (NR). […] Hevea brasiliensis (Willd.) Muell.-Arg., also known as Pará rubber tree, is the primary commercial source for natural rubber (NR) production” (in total nearly 11 million tons in 2011 for all 2,500 rubber tree species).

Although rubber is used for > 50.000 products worldwide this is the first de novo sequencing approach. So far only transcriptome analysis studies were performed, which lack the non-coding regions of the genome.

What was sequenced?

Young leaves of Hevea brasiliensis RRIM 600. Genome size: ~ 2.15 Gb; 18 chromosomes

De novo sequencing strategy:

  1. Libraries: shotgun and mate-pair libraries (insert size: 500 bp) on HiSeq 2000; LPE libraries (insert sizes: 8 kb and 20 kb) on Roche GS FLX; Paired-end library (insert size: 2 kb) on SOLiD
  2. Coverage of all sequencing strategies together: ~ 43x (after filtering repeat-matching reads: ~ 13x = 27.86Gb)
  3. Data output: 143 scaffolds (total 1.119 Mb with N50 = 2.972 bp)
  4. Bioinformatics: CLC Workbench & Newbler assembler using different input data and different assembling strategies

Transcriptome sequencing strategy:

  1. Libraries: cDNA libraries
  2. Sequencing with Illumina HiSeq and Roche/454
  3. Bioinformatics: CLC Workbench assembler for the Illumina reads and Newbler for combining Roche and Illumina reads.

This de novo genome sequencing approach revealed that ~ 78% of the genome are repetitive regions. This study helps to improve breeding of H. brasiliensis by allowing marker assisted selection to further increase the disease resistance and minimize the allergenicity.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:



Whose Genome Has Been Sequenced? Latimera Chalumnae

de-novo-sequencingThe third de novo sequenced genome in our series Whose genome has been sequenced? is the “living fossil” Latimera chalumnae.

The most difficult part for this de novo genome sequencing approach was to get enough starting material. The authors even reported that their first approach was to use the Sanger technology, but is simply was not enough DNA available. Therefore they had to wait until the next generation sequencing techniques were stable enough to risk the sequencing (BioTechniques). Here are the sequencing facts of this study (Amemiya et al.):

What was sequenced?

A blood sample from an adult African coelacanth

De novo sequencing strategy:

  1. Libraries: shotgun library 61-fold coverage; 3 kb jumping library – 88-fold coverage, 40 kb fosmid library 1-fold coverage
  2. Illumina HiSeq 2000 (paired-end module)
  3. De novo genome assembly using the software ALLPATHS-LG
  4. RNA sequencing

RNA-Seq sequencing strategy:

  1. 4 cDNA libraries (1x mRNA-Seq library, 3x strand specific dUTP libraries from brain, gonad/kidney, gut/liver tissue) were sequenced using a HiSeq
  2. Data output: mRNA-Seq library ~ 210M paired-end reads;  dUTP libarires ~ 3-4 Gb of sequence/tissue
  3. Assembly was performed using Trinity

The genome sequencing helped to understand the possibility of this prehistoric fish to thrive on dry land and the phenotype that is so similar to 300 million year old fossils (BioTechniques).

Read the complete publication here.

Earlier published genomes:

Summary from 4th Next Generation Sequencing Congress 2012

Attending the 4th NGS Congress 2012 at London Heathrow I can give here some interesting new facts and information about latest NGS stories which are worth to be shared.

First of all let’s talk about “long read technology” – A Roche 454 talk has been given by Todd Arnold, Vice President R&D, Roche 454.  For Roche GS Junior a new software version 2.7, with  “improved well resolution results in better quality, more robust sequencing runs”  is now available.  As a matter of fact we can confirm these new data outputs while using on our own Junior platform with this update since a while.  Depending on your samples nature  a good part of all reads will be longer than 400 bp and up to 450-480 bp (still using the Titanium Chemistry). But the FLX+ technology is NOT available and also NOT planned for GS Junior – raising the question why,  no concret details or upgrade plans could be given for GS Junior at the London congress…

The real and major highlight about Roche 454 was the description of what we call now “FLX++” sequencing. A software update (2.8) being available now for all the GS FLX systems – together with  the “pimped chemsitry kits” – Roche 454 is offering real “1000bp” Sanger-like reads (as initially aimed at launch).  Some data outputs and slides were shown that demonstrate these new and longer read lengths and also higher data outputs (figure 1). All together that counts up to almost ~1Gb of sequencing data per full PPT run.

Fig 1: Todd Arnold Roche 454 Data Heathrow 2012

Being one of the early access users of the FLX++ upgrades and software version 2.8, we can in fact confirm that the new data outputs are excellent (again depending on the quality of DNA) – in fact one can reach even better results than shown by Roche at the 4th NGS congress in London Heathrow. Here is an example:

Fig 2: Eurofins MWG Operon data with Roche GS FLX++

Of course one may argue now – “that’s nothing compared to Illumina data outputs” – and you are right in terms of the pure data volumes! But the focus here is on long read applications like e.g. sequencing and de novo assembly. And for this kind of NGS application, a modal read length of 800-950 bp or above will tune the final data outputs treamendously. You won’t believe? We can share with you some nice new project data that we have delivered for a fungal de novo sequencing project (figure 2). We were able to deliver chromosome-size scaffolds of 8.3 Mb, 6.0 Mb, 4.3 Mb, 2.8 Mb, 2.4Mb, 2.1 Mb, … when using a long read FLX++ back-bone sequencing at  8x-12x only and combining this data with short read LJD sequencing on HiSeq at 2x 100 bp. The complete data set missed only about 0.5% of all genetic information, while remaining average gap lenght was about 240 bp.  We are actually very interested to learn how 2x 250 bp read length on MiSeq will further improve this excellent data results – one shot genome sequencing at it’s best.

Interested in this kind of project data? Please learn more about our fascinating de novo sequencing & assembly results at our next NGS roadshow in 2013 or send me an email for further discussion about this topic…

Comparison of NGS technologies – just a waste of time?

As already mentioned in our latest blog post Michael Quail and his team from the Sanger Institute published a comparison of the Ion Torrent PGM, the PacBio RS system and the Illumina MiSeq (BMC Genomics). This study and all the others performed recently couldn’t determine one clear winner as each system has its own advantages.

But really interesting are now the statements of the spokespersons from the different companies in a recent article from Julia Karow in GenomeWeb. They all agree on the same thing: the data collected in the publication have been true in 2011, but are outdated by now since a lot of effort is put into innovation. Every instrument performs a lot better now. So what is our conclusion? That comparisons for NGS technologies are just a waste of time? For the Sanger institute it means that they invested in 3 new MiSeq’s since the Illumina pipeline is already available. For me, these comparisons are also valuable for all other institutes. Although maybe outdated, they highlight the strength and weaknesses of each technology and help to decide where to invest thousands of dollars. What do you think?

Product Launches 2011: The MiSeq at the Pole Position

Dear Blog Reader,

In our last NGS poll we asked you about your opinion, which of the product launches from 2011 would have the most impact on your research. We are delighted that more than 50 NGS blog readers gave us their vote.

More than half of you rated the launch of the benchtop sequencer MiSeq as the most important event in 2011. 15% voted for the Roche GS FLX+ technology and 15% voted for the increased data output with Illumina chemistry v3.0. PacBio RS sequencing and the new exomes on the market have the least importance for you (10% and 8%, respectively).

I also see the MiSeq benchtop sequencer as a door opener for many groups towards inhouse access to Illumina sequencing data. The investment for the technology is considerably lower than for the Illumina HiSeq 2000 or HiSeq 2500 and when having not too many projects the higher consumable cost per basepair may be acceptable. Furthermore the MiSeq stands out by its short run time of only several hours.

For a Service Provider using several Illumina HiSeq 2000 sequencer the MiSeq is very interesting, too. It can be used for resequencing and scaffolding of small genomes, for quality control of sequencing libraries and especially for developping new protocols and services.

Did you perform already first sequencing runs on the MiSeq and for what kind of projects do you think is the sequencer most suitable for? I am very much looking forward to hearing about your experiences.


Roche Versus Illumina: Where is it Going?

Six and five years after the commercial launch of their first next generation sequencers, the companies Roche and Illumina are still at the forefront of the next generation sequencing market. By today, Roche technology has enabled 1209 peer-reviewed publications while Illumina technology has even enabled 1631 publications (from both companies’ websites).

The current sequencer of both companies are the Roche GS FLX with an average read length of 350-450 bp and 400 Mbp data output per run and the Illumina HiSeq 2000 with read length of up to 100 bp and up to 320 Gbp data output per run.

Both companies’ focus on technological improvements is mainly on enhancing read number and read length. Illumina has recently launched the 150 bp read length (for GAIIx only) and has further announced doubling of yield for Q2 in 2011. Roche is planning to launch the extended average read length of 750 bp sometimes before the end of June according to GenomeWeb News on April 19th, 2011.
In my point of view third generation sequencing will not replace these technologies but enable additional applications as well as combining third generation and second generation technologies for specific project layouts.