Expression Profiling Without the Need for a Reference Genome
Interested in expression profiling, but you are working with a non-model organism?
A very elegant way for this purpose is to (1) generate long cDNA contigs with NGS technologies that serve as a reference transcriptome and (2) perform expression profiling by mapping Illumina HiSeq 2000 derived short reads of each sample back onto the reference. As only one read is generated per transcript, down and up regulated genes easily can be identified by counting the sequence hits.
This approach was used by Mutasa-Göttgens et al., 2012 in order to analyze targets involved in bolting and flowering in sugar beet. Understanding the regulation of the vernalization-induced bolting and the change towards the reproductive phase is of high importance because bolting and flowering cause considerably reduced sugar content.
To generate the reference transcriptome of the shoot apex, a normalised random primed cDNA library was prepared and sequenced on Illumina HiSeq 2000 with single read module and 100 bp read length. De novo assembly yielded at total of 225’000 unique transcripts, 53’000 of which represent large transcripts (>500 bp and up to >8’700 bp). For quantitative comparison we prepared for the research group a digital gene expression (DGE) library from samples which were subjected to vernalization and / or phytohormone treatment. The libraries were sequenced on Illumina HiSeq 2000 and reads were mapped onto the transcriptome reference sequence.
Bioinformatics analysis identified (amongst others) a potential regulator of vernalization, and therefore an interesting breeding target for the sugar beet crop.
In my opinion, this study is an excellent example of how to combine the strength of different available RNA-Seq libraries most effectively. The normalized random primed library allows unbiased site-directed sequencing. Furthermore the normalization process levels high and low expressed transcripts, which allow identification of low expressed genes accurately and facilitate de novo assembly with short read technology considerably. The DGE library in contrast produces only one tag per transcript, thus allowing much deeper resolution than the mRNA-Seq approach from Illumina, which generates reads that cover the whole transcript.
In the meanwhile, with new NGS libraries available, one would rather use a 3’-fragment library instead of the DGE library. While displaying similar costs, this library type offers longer sequence information (100 bp versus 17bp) and in consequence higher mapping accuracies and reduced numbers of non-mappable reads.
You will find more information regarding this combined approach including the 3’-fragment library for read counting in the following Application Note.