Tag Archives: Roche/454

Whose genome has been sequenced? Brassica napus

de-novo-sequencingBrassicas napus, also known as oilseed rape, was formed more than 7000 years ago by allopolyploidy (chromosome doubling from to Brassicas species). Of course the genome mutated further and so it is known today that during this evolution some genes were preserved and further “improved” (e.g. oil biosynthesis genes), whereas others were lost over the course of time (e.g. glusoinolate genes).

Chalhoub et. al now sequenced the genome, because it can help to “provide insights into allopolyploid evolution and its relationship with crop domestication and improvement” (Chalhoub et. al).

What was sequenced?

Young fresh leaves from the Brassica napus French homzygous winter line “Darmor-bzh“.

Sequencing strategy: Whole genome sequencing

  1. Libraries & Sequencing:
    Roche GS FLX: ~ 70 Million reads, Average Read length: ~ 368 bp, Genome coverage: 21.2 %
    Sanger BAC Seq: 141k reads, Read length: 650 bp; Genome coverage: 0.1%
    Illumina HiSeq:  ~375 Million reads, Read length: 36, 76, 108 and 150 bp, Genome coverage: 53.9%
  2. Data output: 44.146 contigs and 20.702 scaffolds
  3. Results: A final assembly of 849.7 Mb (using SOAP and Newbler) with 89% nongapped sequences.

After genome assembly the genome was mapped to other species (e.g. B. rapa and B. oleracea) and this helped to find several interesting genes and gene variation that help to understand the complete evolution better.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

NGS Applications – get an insight…


You want to know more about projects where your research colleagues used next generation sequencing?

Check out the Nature Reviews overview of interesting publications releated to different applications of next generation sequencing.

All you need to know about NGS

During my research through the web for news in the area of NGS I found this great practical course:

The EMBL-EBI offers a 15h online course that should help every newby and everyone who needs some refreshing information in NGS. And since we have several users of this blog that are new to NGS I thought it would be worth to share this course.

This course is divided into several subunits so everyone can learn in his own speed and in alignment with other tasks that are on everyone’s desk.


The key learnings are:

  • Understand some principles behind NGS
  • Know the challenges created by NGS
  • Know how to submit and retrieve NGS data to and from databases
  • Understand the uses of NGS data in: Whole genome assembly; Gene expression analysis; Genome annotation; Gene regulation analysis; Variation studies

Thank you EMBL-EBI for this great summary!

Possibility of Ideal Intestinal Remedy

I’m not able to keep intestinal condition without remedy which is prepared by Lactobacillus, Bifidobacterium, Lactococcus, and others. Eating yogurt is also okay for this purpose, ad personam I prefer to take these bacterial tablets and believe more effects. However, many people know these effects of current intestinal remedies are mild not fast-acting properties.

Several reports mentioned that intestinal bacterial flora and its regulation were not simple. It is starting discussion that natural immunity may regulate intestinal flora; e.g. antibacterial peptide α-defensins which is secreted from paneth cells on small-intestinal epithelium could regulate flora distribution (Salzman et al., 2010; Matsuda et al., 2011). By contrary, very simple strategy is reported as following: Clostridium difficile brings bad diarrhea that was resistant to antibiotic. Van Nood and co-workers (2013) injected healthy person’s feces into patient’s guts, and its curative effect was so good surprisingly, but we cannot call it a remedy!

Two NGS platforms, GS FLX/Junior and MiSeq, can perform distributional analysis via deep sequencing of 16S-rRNA amplicons. But it is still difficult for both platforms to do metagenome assembling for getting whole gene information in flora, because their read length is not long enough to make reliable contigs without chimeras between different bacteria one another. Therefore I strongly expect that super long read platforms including PacBio RS series and coming nano-pore technologies will break current limitations and will contribute to develop ideal intestinal remedy for my instable stomach.

News From The World Of NGS

Pacific Bioscience’s instrument PacBio RS is known as the third generation sequencing technology. And again they proof their innovative character. During the course of the last couple of months two new chemistry packages have been released (XL & P4). And during the last couple of days two even more interesting news were spread:

1. Roche and PacBio’s signed an agreement for codeveloping diagnostic products for the PacBio RS instrument (Genome Web). From my point of view this is a huge signal. Roche, as an experienced player in the NGS market with its own sequencing instruments see’s a lot of potential in the SMRT technology. So the PacBio RS system obviously got out of the teething phase and will increase its importance in the NGS business in the coming months.

2. And also New England Biolabs a big player in the area of enzyme production, proteomics and drug discovery is using the PacBio RS to study bacterial methylomes and work on new reagents for 5-mc detection. And the CSO from NEB highlights that they have choosen “the PacBio system to study bacterial methylomes because of a unique feature of SMRT sequencing that enables the detection of base modifications through the system’s kinetics”. (Genome Web)

Beside the great news for Pacific Biosciences als Life Technologies or better Thermo Fisher signed a great deal with the Chinese Dx firm iGenomics to install 32 Ion Proton sequences in 2013 (Genome Web).

And clearly all developments focus on Molecular Diagnotics and Clinical Diagnostics. And to add the missing link in this news update also Illumina recently announced that they partner with G3 to identify novel biomarkers and pathways in cardiovascular disease.

Whose genome has been sequenced? Anas platyrhynchos

de-novo-sequencingStarting with a great deal of attention for the bird flu in 2005, nearly every year a potential Influenza epidemic is discussed in the media. This  leads to greater awareness for influenza research projects. A well suited research tool for influenza viruses are ducks. Ducks harbor nearly all hemagluttinin (HA) and neuraminidase subtypes and the harm for the ducks is often neglectable.
Huang and his research team have now sequenced the ducks genome to search for defense mechanisms in ducks against influenza viruses (Huang et. al., Nature Genetics).

What was sequenced?

A 10-week-old female Beijing duck (Anas platyrhynchos)

Sequencing strategy: Whole genome sequencing

  1. Libraries: 8 shotgun libraries and 5 mate-pair libraries (insert sizes: SG lib 185 – 530 bp,  mate-pair lib 2 – 10 kb), (50 bp reads) using the Illumina GA Solexa technology
    note: sequencing method according to the de novo Panda Genome project
  2. Read output: >77 Gb of paired-end reads (~ 64x coverage)
  3. Data output: 78.487 scaffolds with a contig N50 length of 26 kbp and a scaffold N50 length of 1.2 Mbp; total covered length of 1.1 Gb (~ 95% of the genome)
  4. Bioinformatics: Genome assembly using SOAPdenovo
  5. Additional comparative studies with the duck genetic and physical map resulted in 47 superscaffolds which contained 225 scaffolds and spanned 289 Mbp

Transcriptome analysis

  1. Libraries: Infected as well as control duck transcriptomes were sequenced using the Roche GS FLX instruments. In addition cDNA-libraries  were sequenced using the Illumina GA instrument
  2. Data output after BI: 319,996 contigs with an average length of 307 bp
  3. Bioinformatics: Illumina transcriptome mapping and assembly was performed using SOAPaligner and SOAPdenovo software. Re-assembly together with 454 data was performed using Phrap software

The intensive study of the ducks genome using de novo genome and transcriptome sequencing approaches helped to identify significant changes in the genetic pattern compared to other bird species: the duck genome […] includes genes that are not present in the other three species whose genomes have been sequenced .” (Huang et. al)

I think it’s a quite interesting approach to learn more about a virus and its infectivity by studying the interaction between host – virus.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Whose Genome Has Been Sequenced? Theobroma Cacao L.

de-novo-sequencingI suppose there is no human being on the planet not knowing chocolate. “The tropical Theobroma cacao tree has been cultivated for at least three thousand years. Its earliest documented use  is arount 1100 BC (wiki.org).”

The latest de novo genome sequencing publication about a cacoa plant focusses on the Theobroma cacao L. Matina 1-6 clone, which is the most common cultivated type of cacao worldwide (Motamayor et al.). And although a first draft of this clone has already been published in 2010 the authors aim for an improved version of the genome to identify candidate genes regulating traits.

What was sequenced?

Leaves from Theobroma cacao L. Matina 1-6 clone; haploid genome size ~0.5 Gbp

Sequencing strategy: Whole genome sequencing plus BAC & fosmid end sequencing

  1. Libraries: shotgun and 8 long paired-end (LPE) libraries (insert size: 3 kbp; 6 kbp, 8 kbp) on the Roche GS FLX; three fosmid libraries and three independent BAC libraries with Sanger Sequencing
  2. Read output: > 32 million reads
  3. Data output: 711 scaffolds with a total scaffold length of 346 Mbp with a contig N50 length of 84.4 kbp and a scaffold N50 length of 34.4 Mbp
  4. Bioinformatics: Beside other tools Arachne, Megablast and blastx were used for genome assembly

Gene annotation and orthology analysis

  1. Libraries: long normalised libraries sequenced on the Roche GS FLX and short-paired reads libraries sequenced on the Illumina platform
  2. Read output: ~ 7 M reads from the Roche and ~ 1 billion reads from the Illumina sequencing
  3. Bioinformatics: Transcriptome assembly using the NCBI TSA within BioProject 51633 & final refining using PASA. Further tools where used for marker identification and comparison to other plant species

As further analysis tools re-sequencing as well as qPCR expression analysis were performed to finally  report a “high-quality sequence and annotation of T. cacao L.  and demonstrate its utility in identifying candidate genes regulating traits.” (Motamayor et al.)

From my point of view this is a high complex study using a comprehensive range of sequencing technologies. This shows once more that not only one sequencing strategy is needed to fully characterise a genome and start interpreting its secrets.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Whose Genome Has Been Sequenced? Hevea brasiliensis

de-novo-sequencingAll of us have at least once been doing experiments in the lab. And so everyone was confronted with latex gloves. And more and more of us developed a kind of latex allergy.

According to Rahman et al. “these allergies are triggered by certain proteins present in Hevea-derived natural rubber (NR). […] Hevea brasiliensis (Willd.) Muell.-Arg., also known as Pará rubber tree, is the primary commercial source for natural rubber (NR) production” (in total nearly 11 million tons in 2011 for all 2,500 rubber tree species).

Although rubber is used for > 50.000 products worldwide this is the first de novo sequencing approach. So far only transcriptome analysis studies were performed, which lack the non-coding regions of the genome.

What was sequenced?

Young leaves of Hevea brasiliensis RRIM 600. Genome size: ~ 2.15 Gb; 18 chromosomes

De novo sequencing strategy:

  1. Libraries: shotgun and mate-pair libraries (insert size: 500 bp) on HiSeq 2000; LPE libraries (insert sizes: 8 kb and 20 kb) on Roche GS FLX; Paired-end library (insert size: 2 kb) on SOLiD
  2. Coverage of all sequencing strategies together: ~ 43x (after filtering repeat-matching reads: ~ 13x = 27.86Gb)
  3. Data output: 143 scaffolds (total 1.119 Mb with N50 = 2.972 bp)
  4. Bioinformatics: CLC Workbench & Newbler assembler using different input data and different assembling strategies

Transcriptome sequencing strategy:

  1. Libraries: cDNA libraries
  2. Sequencing with Illumina HiSeq and Roche/454
  3. Bioinformatics: CLC Workbench assembler for the Illumina reads and Newbler for combining Roche and Illumina reads.

This de novo genome sequencing approach revealed that ~ 78% of the genome are repetitive regions. This study helps to improve breeding of H. brasiliensis by allowing marker assisted selection to further increase the disease resistance and minimize the allergenicity.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:



800 bp Read Length For Amplicon Sequencing Is Not Science Fiction

Amplicon sequencing with Roche GS JuniorAbout a year ago my colleguage Regina reported about the new possibilities of using the MiSeq system for amplicon sequencing (16S Amplicon Experiments: Which Platform to Choose?). Now, one year later still everything is true about the advantages of amplicon sequencing using the MiSeq (e.g. lower cost/base).

The main advantage of the Roche system are the long reads that are highly valuable for some applications. By ligating appropriate sequencing adaptors we can currently deliver average read length of up to 700 bp when using the GS FLX+ pipeline. Further improvements regarding the read length can be expected with the launch of a new amplicon pipeline from Roche for the Roche GS FLX+ system (planned for summer 2013).

And beside the ultra long reads on the GS FLX+ system there are still some advantages of amplicon sequencing using the GS Junior system compared to other technologies:

+ short turnaround time (starting from 5-10 working days)

+ competitive pricing

+ moderate to long reads (350 – 450 bp)

+ sufficient data output for all projects with a medium size of samples (e.g. up to 24)

What is your preferred next generation sequencing technology for amplicon sequencing? Take part in our current poll.

De Novo Transcriptome of a Model Organism to Study Tissue Regeneration

Newts have an extraordinary ability to regenerate tissues. For example, they can re-grow fully functional limbs after amputation. In addition, regeneration of parts of the central nervous system, the heart, and the lens has been characterized, making them an excellent model organism for studying regenerative processes. However, because of their enormous genome size (10 times that of human), the molecular mechanisms behind this amazing regenerative process are largely unknown.

A research group at the Max Plank Institute recently published a de novo assembly of the transcriptome of the urodelian amphibian Notophthalmus viridescens (Looso M. et al. ). The researchers combined 454, Illumina, and Sanger sequencing data from both normalized and non-normalized cDNA libraries. The resulted transcriptome comprises over 120,000 non-redundant transcripts. Homology search using BLAST led to annotation of 38,000 transcripts. Importantly, they found 800 transcripts, whose protein-coding potential was validated by mass spectrometry, that show no similarity to any know transcripts or show similarity to urodele-specific EST sequences. Some of these transcripts belong to novel protein families.

It is an interesting hypothesis that some of those newt-specific proteins may provide mechanistic insights into regeneration processes unique to these animals. Their work will definitely be an important resource for subsequent studies in tissue regeneration and may benefit future research in regenerative medicine.