Archive | Publications & Surveys RSS feed for this section

Sequencing than soaking in Hot Spring

There are many volcanoes and earthquakes in Japan, but it is not always a bad thing, they are also responsible for the many hot springs. Most Japanese people love soaking in a hot spring and they believe that this eliminates fatigue and improves health. Hot springs also had a great contribution to biotechnology via the heat resistant DNA polymerase from Thermus aquaticus (Taq) and its derivatives. Not only PCR, but also Sanger sequencing was accelerated by these heat resistant enzymes as we all know well.

Scientists have started to study the genome/transcriptome world in hot springs with NGS technologies. Murakami et al., peformed 16S-rRNA (Sanger sequencing) and meta-transcriptome analysis from small RNA (GS FLX sequencing) of groundwater (up to 1,000 m depth) from Yunohara hot spring, Japan. Their phylogenetic analysis using 16S rRNA showed the classification of 17 species including archaea and eubacteria.  There are only 2 or 3 dominant species in typical cases of other hot springs, but this one is rich in diversity. Furthermore, they found the very unique group “Archaeal Richmond Mine Acidophilic Nanoorganisms (ARMAN)” which is a small organism/cell with only 200 nm size! Their small RNA analysis identified 64,194 (20,057 nonredundant) cDNA sequences, and they found several novel non coding RNAs which have a very stable secondary structure.

Therefore, hot springs may still be gold mines for useful genes and important biological knowledge of unknown underground ecosystems.

 

 

Survey Result:
Applications Using Roche GS Technology Providing Read Length <700 bp

We asked, for which kind of application do you use the Roche GS FLX+, the GS FLX Titanium or GS Junior sequencing technology providing read length of up to 700 bp? 36 people answered the poll:

 

 

 

 

 

 

 

 

 

Please find on the right hand side our new poll and add your voice!

Expression Profiling Without the Need for a Reference Genome

Interested in expression profiling, but you are working with a non-model organism?

A very elegant way for this purpose is to (1) generate long cDNA contigs with NGS technologies that serve as a reference transcriptome and (2)  perform expression profiling by mapping Illumina HiSeq 2000 derived short reads of each sample back onto the reference. As only one read is generated per transcript, down and up regulated genes easily can be identified by counting the sequence hits.

This approach was used by Mutasa-Göttgens et al., 2012  in order to analyze targets involved in bolting and flowering in sugar beet. Understanding the regulation of the vernalization-induced bolting and the change towards the reproductive phase is of high importance because bolting and flowering cause considerably reduced sugar content.

To generate the reference transcriptome of the shoot apex, a normalised random primed cDNA library was prepared and sequenced on Illumina HiSeq 2000 with single read module and 100 bp read length. De novo assembly yielded at total of 225’000 unique transcripts, 53’000 of which represent large transcripts (>500 bp and up to >8’700 bp). For quantitative comparison we prepared for the research group a digital gene expression (DGE) library from samples which were subjected to vernalization and / or phytohormone treatment. The libraries were sequenced on Illumina HiSeq 2000 and reads were mapped onto the transcriptome reference sequence.

Bioinformatics analysis identified (amongst others) a potential regulator of vernalization, and therefore an interesting breeding target for the sugar beet crop.

In my opinion, this study is an excellent example of how to combine the strength of different available RNA-Seq libraries most effectively. The normalized random primed library allows unbiased site-directed sequencing. Furthermore the normalization process levels high and low expressed transcripts, which allow identification of low expressed genes accurately and facilitate de novo assembly with short read technology considerably. The DGE library in contrast produces only one tag per transcript, thus allowing much deeper resolution than the mRNA-Seq approach from Illumina, which generates reads that cover the whole transcript.

In the meanwhile, with new NGS libraries available, one would rather use a 3’-fragment library instead of the DGE library. While displaying similar costs, this library type offers longer sequence information (100 bp versus 17bp) and in consequence higher mapping accuracies and reduced numbers of non-mappable reads.

You will find more information regarding this combined approach including the 3’-fragment library for read counting in the following Application Note.

How dirty is your office?

16S rRNA sequencing of samples from 54 office-common surfaces in 3 different cities (New York, San Franscisco, Tuscon) revealed that offices of men are dirtier than these of women and the offices in San Francisco are the cleanest among the three cities. This is part of the results from Hewitt et al. published in PLoS ONE just recently. Overall they found “more than 500 bacterial genera from 20 different divisions” (Hewitt et al.) whereas most could be found on chairs and phones (see graph). But interestingly the bacterial population from Tuscon was significant different to the one from San Francisco and NewYork although the distance between Tuscon and San Francisco is smaller. From my point of view this is a great study showing that distribution is not as obvious as we think and that we haven’t revealed every secret on earth yet.

Nacreous Luster Spun by Gene Expression

Nacreous luster is held in high industrial value since ancient time. It is a jewelry which is generated in pearl oysters, so it is also called “biomineral”. Nacre consists of two kinds of layer structures: an “inorganic crystal layer by calcium carbonate” and a “protein layer”. The protein layer is made of a laminate structure, which comes up the characteristic luster by multilayer reflective. Recently, pearls are not only used as jewels but take on greater importance like as a new functional material for nano technology, as a CO2 fixation carrier for environmental science, and as a model of bone formation/bio-calcification for medical science.

However, molecular entity of the protein layer is not understood so much. To clarify it, Kinoshita et al. tried transcriptome analysis of the pearl oyster Pinctada fucata with 3’-fragment library and GS FLX sequencing. They could identify 29,682 novel genes, and clustering analysis of gene expression pattern with known nacreous genes revealed 20 candidates that most probably have an association with bio-mineralization. Furthermore, Takeuchi et al. determined the 1.15 Gb draft genome sequence of P. fukata. They found 23 257 complete gene models that included the candidate genes reported in the study from Kinoshita et al.

The spinning process of nacreous luster will be clarified by harmony of gene expression in near future!

Survey result: What do you think about Nanopore sequencing?

Earlier this year Oxford Nanopore Technologies presented their solution for Next Generation Sequencing: the MinIon & GridIon instruments outranges the current available techniques like Illumina or Roche systems by read length, hands on time and pricing. But since the technology is not launched yet, we don’t know if these specs are realistic.

This is why we asked you about your opinion in our latest poll (Nanopore sequencing from Oxford Nanopore Technologies sounds really fascinating. What is your opinion regarding this technology?). More than 50 voters took part in this survey and 42% share my opinon: “I prefer to wait and check out the real system before judging it”.

“Paper doesn’t blush” is what 15% think of this announcement – like every other company the first presentation needs to be spectacular, but let’s see what happens when the instrument is really on the market.

And still some of you are convinced that this will change a lot in the NGS market – and I agree it would be great if it turns out to be true.

Some of you haven’t heard about this technology – so if you are interested to learn more about it you might start by reading our recent blog post about it.

Thanks again to all you participated in our voting and please have a look at our new poll.

RAD-Seq – A brief technical overview

Some time ago I was introducing a new approach combining restriction site associated DNA marker genotyping (RAD) with next generation sequencing technology. Originally this method was developed for microarray platforms. However, the combination of RAD and NGS (Illumina) – resulting in RAD sequencing (RAD-Seq) – enabled the massivly parallel and multiplexed sample sequencing. RAD-Seq is becoming more and more powerful and has the potential to revolutionize agrigenomics, because one can discover and screen thousands of SNP’s and genotype large populations in a high throughput manner at the same time. The scope of the following section is to give a short technical overview how this can be accomplished:

Genomic DNA of each sample is digested in parallel with a certain restriction enzyme and a specific P1 adapter is ligated to the restriction fragments. Thereby each sample will be equipped with an individual P1-adapter containing a sample-specific molecular identifier (Barcode) and Illumina adapter sequences (forward amplification primer site and Illumina sequencing primer site, respectively). If multiplexing is desired, the adapter-ligated fragments of a number of samples can now be pooled. The level of multiplexing depends on the number of differed P1-adapters which have been used before. In a further step the RAD pool will be sheared, size-selected and ligated with a second adapter (P2). The P2 adapter comprises a divergent “Y” adapter containing the reverse amplification primer sites. However, the P2 adapter is special such that fragments lacking the P1 adapter cannot be amplified. This guarantees, that only fragments containing a P1 and a P2 adapter will be selectively and robustly enriched during amplification step following next. The overall length of RAD-tags which can be further analysed mainly depend on the size selection step and sequencing run mode (single vs. paired end), respectively.

 

Expression Profiling and Sequencing Depth

The majority of scientists performing expression studies use the mRNA-Seq protocol (random-primed cDNA synthesis after fragmentation of PolyA-purified transcripts) and sequence the fragments with Illumina technology. By planning the experiment the question of the sequencing depth immediately arises. And for all of you being interested in an answer I want to share with you the recommendations on sequencing depth from experts in the field of transcriptome sequencing published in genomeweb.

You can see that the recommendations vary between 10 million single end reads and up to hundreds of millions reads depending on the exact need. And it is really tough for the experts to give a concrete number. Please keep in mind that about 80-85% of the transcripts in a typical transcriptome are representing only a few highly expressed transcripts whereas the majority of transcripts is present in a few copies only. For just straight gene expression analysis the interviewed scientists usually use around 20 – 30 million reads per sample. But when your aim is to look at really low expressed genes, like some transcription factors, you definitely have to apply a higher sequencing depth. And the very same is true when transcript isoforms or fusion genes shall be analyzed. For this applications the required sequencing depth can be as much as a full channel per sample.

So, enjoy reading their comments!

16S Amplicon Experiments: Which Platform to Choose?

Since 2010 several studies have been published that analyze microbial community composition by amplicon sequencing on the Illumina Genome Analyzer (GA). However, direct adaption of these protocols for sequencing on the HiSeq 2000 – the currently predominant Illumina sequencer – is not possible as both systems use different basecalling pipelines. Therefore amplicon sequencing on Illumina HiSeq 2000 is still left to the very experienced users and only a few publications can be studied on this.

In the meanwhile Illumina has introduced the MiSeq as the optimal platform for this kind of projects. In this context they have published an application note presenting sequencing of the V4 region of 16S rRNA genes on the MiSeq system.

And I totally agree that the MiSeq is a very good tool for these studies. For me, the most important advantages of the MiSeq layout in comparison to the sequencing on Illumina HiSeq 2000 are as follows:

  • Shorter turnaround time: The sequencing run itself takes a bit more than one full day, while a HiSeq 2000 run takes up to 12 days.
  • More informational content: By overlapping two paired end reads of 150 bp, full-length reads of about 250 bp can be generated
  • Potential for even longer reads: Illumina has announced read length of 250 bp for the end of the year. Then reads of up to 450 bp should be possible.

Nevertheless Roche GS FLX+ sequencing is still able to generate much longer reads with an average of up to 500-600 bp. And the long read length will provide a deeper insight into the microbiome of interest or more precisely higher classification efficiency down to species level. However Roche sequencing goes along with higher costs per base, so it will always be a decision based on the individual experiment, whether read length or sequencing depth is the most important factor.

Human Ancestors?

The Neandertals lived around 30.000 years ago. The Oetzi died around 5.000 years ago. For both human ancestors researchers were able to fully sequence the genome now. Prof. Pääbo and his group from the MPI in Leipzig published around ~60% of the Neandertal genome in Science (2010). And quite impressively from my point of view is that they give full access to the genome to everyone: they simply put all data on their website. What also fascinated me is that it is quite difficult to study the resemblance between the Neandertal and modern humans since most of the bones found from the Neandertal are “contaminated” with modern human genes. And of course this is obvious since no anthropologist is wearing gloves by default and therefore all people touching the bones to do studies about age and the lifestile of our ancestors will leave their genes on the bones.

An eye opener for me is also that the most obvious thing we discover in the genome is always the impairment of a species. A good example is the recent publication of the complete genome of the Iceman (Oetzi). 96% of the Iceman’s genome has been sequenced and what did we learn: he belonged to blood group O, was lactose intolerant, had probably a genetic tendency towards coronary heart disease, and was carrying Lyme disease.

But researchers also found interesting information about the linkage of both Oetzi and Neandertal to modern humans:

The genome of Oetzi has been compared to 1300 Europeans, 125 North Africans and 20 people from the Arab peninsula. The study revealed that his closest living kin are found on Sardinia and Corsica.

For the Neandertal five modern humans from different populations were used for comparison studies. The stunning result is that some Neandertals and early modern humans interbred since 1 to 4% of the DNA of many humans who live outside of Africa originate from the Neandertal.

In all the discussions about our ancestors and close relatives I sometimes come to think if we will be close relatives in let’s say 1 million years? Wouldn’t it be possible that a new population or species of humans develop? It sounds absurd or science fiction-like but who are we that we think there is nothing “after us”?