Tag Archives: Data storage

Data analysis – still a bottleneck!

With the many NGS machines around in the field, we daily produce tremendous amounts of sequencing data. However, at the end of the day, all the data have to be analyzed and interpreted. In many cases, this step is still a bottleneck.

Please check the video below which is an interview with Lex Nederbragt, Bioinformatician at the Norwegian High-Throughput Sequencing Centre in Oslo, on this topic. He discusses the fact that the analysis tools which are available do not fully fulfill the needs of the researchers. In this context, he also discusses the use of open source and commercial software tools.

Lex Nederbragt discussing software bottlenecks and lack of flexible reference genomes from NGS Perspectives on Vimeo.

Synthetic DNA – Data Storage For Eternity?

In the April issue of the journal Spektrum der Wissenschaft I found a very interesting article from Jan Dönges about data storage of information with the help of synthetic DNA (oligonucleotides). He describes the work of Ewan Birney and Nick Goldman from the European Bioinformatics Institute (EBI) in Hinxton, UK who have developed a strategy that allows coding data in strings of A, C, G and T nucleotides (Nature 494, 77-80, February 7 2013). They coded all sonnets of Shakespeare, a photo of the institute, the original paper of Watson and crick about the structure of DNA, an audio recording of the speech of Martin Luther King “I have a dream” and file with coding instructions; all together 739 kilobyte of information. They ordered the oligos and sequenced them on an Illumina HiSeq 2000. They received a text file of the letters A, C, G and T that could be converted into the original data. The complete code and sequence can be found here.

From sequencing experiments like the mammoth or the Neanderthal man we know that DNA is at least 10,000 years stable, longer than any other data storage. In addition it is extremely dense. With 1 gram of DNA it is possible to code more than 2 petabyte (1015 byte), or 2.3 million gigabyte. The volume of a coffee cup would be sufficient to code 100 million hours of high resolution videos. It is to be expected that the technology could even be improved in the future as long as mankind still is interested in DNA. The cost for the experiment was quite high compared to other storage media like tapes, HDD or DVDs. However, already after 600 years of making consecutive security copies of tapes the cost is compensated. So, if we want to conserve the knowledge of mankind for very long periods and make sure that it survives possible major disasters in the future, this seems to be a reasonable strategy

DNA as Digital Data Storage – New Ways for using NGS?

While data output and quality of Next Generation Sequencing is continually increasing, the cost per base is steadily dropping. A survey  from the National Human Genome Research Institute (NHGRI) shows that the cost development even exceeds Moore’s law. New doorways  for research are opening, which may not have been regarded as realistic in the past  due to this trend.

For example, over the past years, several approaches have been made to use DNA as a means of storing information. In a study recently published online in Science, scientists developed a strategy to encode and read digital information using DNA Synthesis and  Next Generation Sequencing Systems.

A html document containing more than 50,000 words, 11 JPG images, and a Java Script program was encoded in DNA by synthesizing nearly 55,000 oligonucleotides on high-fidelity microarrays. The information stored in the oligonucleotides library was later “read” by Illumina sequencing.

According to the authors, DNA is a very useful medium for long term storage of information:   DNA is very stable over many years,  allows data storage at very high density and  small volumes. The senior author, Kosuri, told InSequence, they only used some 50 ng of oligonucleotides to store the information of this html document! Kosuri admitted that the study costed several thousand dollars. However, if Next Generation Sequencing continues to develop at the same speed as today, new applications such as using DNA for (long-term) data storage may become a feasible option.

So let us see what is coming next!