Tag Archives: mRNA-Seq

Whose Genome Has Been Sequenced? Latimera Chalumnae

de-novo-sequencingThe third de novo sequenced genome in our series Whose genome has been sequenced? is the “living fossil” Latimera chalumnae.

The most difficult part for this de novo genome sequencing approach was to get enough starting material. The authors even reported that their first approach was to use the Sanger technology, but is simply was not enough DNA available. Therefore they had to wait until the next generation sequencing techniques were stable enough to risk the sequencing (BioTechniques). Here are the sequencing facts of this study (Amemiya et al.):

What was sequenced?

A blood sample from an adult African coelacanth

De novo sequencing strategy:

  1. Libraries: shotgun library 61-fold coverage; 3 kb jumping library – 88-fold coverage, 40 kb fosmid library 1-fold coverage
  2. Illumina HiSeq 2000 (paired-end module)
  3. De novo genome assembly using the software ALLPATHS-LG
  4. RNA sequencing

RNA-Seq sequencing strategy:

  1. 4 cDNA libraries (1x mRNA-Seq library, 3x strand specific dUTP libraries from brain, gonad/kidney, gut/liver tissue) were sequenced using a HiSeq
  2. Data output: mRNA-Seq library ~ 210M paired-end reads;  dUTP libarires ~ 3-4 Gb of sequence/tissue
  3. Assembly was performed using Trinity

The genome sequencing helped to understand the possibility of this prehistoric fish to thrive on dry land and the phenotype that is so similar to 300 million year old fossils (BioTechniques).

Read the complete publication here.

Earlier published genomes:

Expression Profiling with 3‘-Libraries

My last week’s blog article was about expression profiling with mRNA-Seq libraries and about the required sequencing depth of this protocol. But there are other possibilities for expression profiling, and today I especially want to highlight the 3’-fragment library protocol.

The big advantage of this protocol is that it provides a much higher resolution than mRNA-Seq does. The reason is that within mRNA-Seq the average transcript is represented by approx. 10-25 reads that cover the whole transcript, while with the 3’-fragment protocol only one read is generated per transcript. The derived reads from a 3’-end library map to the 3’-end of the transcripts and expression differences are easily collected by just counting the reads that map to a specific reference transcript.

The 10-25-fold higher resolution comes along with considerably reduced projects costs as 10-25-fold less sequencing is required to obtain a similar depth of the analysis. Or in other words: When analyzing the same number of samples per channel the 10-25 fold higher resolution allows the scientist to even look at very low expressed genes with reliable statistical evidence.

Of course the mRNA-Seq protocol is needed in case other analysis shall follow, like the study of alternative transcripts, or fusion genes. But this is anyway a completely different story as these applications need an even higher sequencing depth than expression profiling with mRNA-Seq does require.

As a conclusion I think it is definitely worth to evaluate this protocol when having in mind an expression profiling experiment. And we would be delighted if you share your thoughts on this with us and the other blog readers.

Expression Profiling and Sequencing Depth

The majority of scientists performing expression studies use the mRNA-Seq protocol (random-primed cDNA synthesis after fragmentation of PolyA-purified transcripts) and sequence the fragments with Illumina technology. By planning the experiment the question of the sequencing depth immediately arises. And for all of you being interested in an answer I want to share with you the recommendations on sequencing depth from experts in the field of transcriptome sequencing published in genomeweb.

You can see that the recommendations vary between 10 million single end reads and up to hundreds of millions reads depending on the exact need. And it is really tough for the experts to give a concrete number. Please keep in mind that about 80-85% of the transcripts in a typical transcriptome are representing only a few highly expressed transcripts whereas the majority of transcripts is present in a few copies only. For just straight gene expression analysis the interviewed scientists usually use around 20 – 30 million reads per sample. But when your aim is to look at really low expressed genes, like some transcription factors, you definitely have to apply a higher sequencing depth. And the very same is true when transcript isoforms or fusion genes shall be analyzed. For this applications the required sequencing depth can be as much as a full channel per sample.

So, enjoy reading their comments!