Tag Archives: Expression profiling

Expression Profiling Without the Need for a Reference Genome

Interested in expression profiling, but you are working with a non-model organism?

A very elegant way for this purpose is to (1) generate long cDNA contigs with NGS technologies that serve as a reference transcriptome and (2)  perform expression profiling by mapping Illumina HiSeq 2000 derived short reads of each sample back onto the reference. As only one read is generated per transcript, down and up regulated genes easily can be identified by counting the sequence hits.

This approach was used by Mutasa-Göttgens et al., 2012  in order to analyze targets involved in bolting and flowering in sugar beet. Understanding the regulation of the vernalization-induced bolting and the change towards the reproductive phase is of high importance because bolting and flowering cause considerably reduced sugar content.

To generate the reference transcriptome of the shoot apex, a normalised random primed cDNA library was prepared and sequenced on Illumina HiSeq 2000 with single read module and 100 bp read length. De novo assembly yielded at total of 225’000 unique transcripts, 53’000 of which represent large transcripts (>500 bp and up to >8’700 bp). For quantitative comparison we prepared for the research group a digital gene expression (DGE) library from samples which were subjected to vernalization and / or phytohormone treatment. The libraries were sequenced on Illumina HiSeq 2000 and reads were mapped onto the transcriptome reference sequence.

Bioinformatics analysis identified (amongst others) a potential regulator of vernalization, and therefore an interesting breeding target for the sugar beet crop.

In my opinion, this study is an excellent example of how to combine the strength of different available RNA-Seq libraries most effectively. The normalized random primed library allows unbiased site-directed sequencing. Furthermore the normalization process levels high and low expressed transcripts, which allow identification of low expressed genes accurately and facilitate de novo assembly with short read technology considerably. The DGE library in contrast produces only one tag per transcript, thus allowing much deeper resolution than the mRNA-Seq approach from Illumina, which generates reads that cover the whole transcript.

In the meanwhile, with new NGS libraries available, one would rather use a 3’-fragment library instead of the DGE library. While displaying similar costs, this library type offers longer sequence information (100 bp versus 17bp) and in consequence higher mapping accuracies and reduced numbers of non-mappable reads.

You will find more information regarding this combined approach including the 3’-fragment library for read counting in the following Application Note.

Expression Profiling with 3‘-Libraries

My last week’s blog article was about expression profiling with mRNA-Seq libraries and about the required sequencing depth of this protocol. But there are other possibilities for expression profiling, and today I especially want to highlight the 3’-fragment library protocol.

The big advantage of this protocol is that it provides a much higher resolution than mRNA-Seq does. The reason is that within mRNA-Seq the average transcript is represented by approx. 10-25 reads that cover the whole transcript, while with the 3’-fragment protocol only one read is generated per transcript. The derived reads from a 3’-end library map to the 3’-end of the transcripts and expression differences are easily collected by just counting the reads that map to a specific reference transcript.

The 10-25-fold higher resolution comes along with considerably reduced projects costs as 10-25-fold less sequencing is required to obtain a similar depth of the analysis. Or in other words: When analyzing the same number of samples per channel the 10-25 fold higher resolution allows the scientist to even look at very low expressed genes with reliable statistical evidence.

Of course the mRNA-Seq protocol is needed in case other analysis shall follow, like the study of alternative transcripts, or fusion genes. But this is anyway a completely different story as these applications need an even higher sequencing depth than expression profiling with mRNA-Seq does require.

As a conclusion I think it is definitely worth to evaluate this protocol when having in mind an expression profiling experiment. And we would be delighted if you share your thoughts on this with us and the other blog readers.

Expression Profiling and Sequencing Depth

The majority of scientists performing expression studies use the mRNA-Seq protocol (random-primed cDNA synthesis after fragmentation of PolyA-purified transcripts) and sequence the fragments with Illumina technology. By planning the experiment the question of the sequencing depth immediately arises. And for all of you being interested in an answer I want to share with you the recommendations on sequencing depth from experts in the field of transcriptome sequencing published in genomeweb.

You can see that the recommendations vary between 10 million single end reads and up to hundreds of millions reads depending on the exact need. And it is really tough for the experts to give a concrete number. Please keep in mind that about 80-85% of the transcripts in a typical transcriptome are representing only a few highly expressed transcripts whereas the majority of transcripts is present in a few copies only. For just straight gene expression analysis the interviewed scientists usually use around 20 – 30 million reads per sample. But when your aim is to look at really low expressed genes, like some transcription factors, you definitely have to apply a higher sequencing depth. And the very same is true when transcript isoforms or fusion genes shall be analyzed. For this applications the required sequencing depth can be as much as a full channel per sample.

So, enjoy reading their comments!

Next Generation Sequencing: Is There New Hope For Patients With Rare Diseases?

Some new studies published online in Science Translational Medicine  ( Sirota et al. > and Dudley et al. > ) demonstrate the potential of genomics to find new applications for existing drugs ( GenomeWeb >). They detected 53 significant drug-disease interactions. In one case they could find evidence that the ulcer drug cimetidine might be effective against lung cancer.

This gives new hope for effective treatment of rare diseases, where new potential orphan drugs are not available.