Archive | Publications & Surveys RSS feed for this section

Big Data Pentetrates the Pharma Industry

big-dataEveryone is talking about big data. Many IT companies put their focus on analyzing tons of information. Also the pharmaceutical industry collects very large data volume during their drug research, discovery and above all during the clinical phases.

This article from addresses the big data analysis demands of the pharma industry in a very interesting way. Therefore, I encourage everyone to read the article.

Visit “Leveraging big data to solve Pharma’s hard to cure problems”

Rapid Genome Sequencing in NICUs and PICUs

protocolSTAT-Seq to rapidly detect thousands of genetic diseases.

A recently published study in The Lancet Respiratory Medicine reveals the early results of the clinical usefulness of rapid whole genome sequencing in neonatal and pediatric intensive care units (NICUs and PICUs). Children’s Mercy Kansas City’s STAT-Seq test helped diagnose a genetic disease in more than one half of 35 critically ill infants tested, compared to just 9% with standard genetic tests.

Besides the medical impact on treatment strategies I want to share some information about STAT-Seq.

STAT-Seq, which runs on Illumina’s HiSeq 2500, is first of all a research protocol. It is the fastest whole genome test that might take less than 50 hours from test order to delivery of an initial report once it is fully implemented in the lab. STAT-Seq can identify mutations across the genome associated with approximately 5,300 known genetic diseases.

The study showed a significantly improved diagnosis rate for whole genome sequencing versus traditional testing. But it did not show an improvement over what is typically seen in exome sequencing. The latter only examines the parts of DNA that code for proteins, the body’s basic building blocks.

Right now exome sequencing is the more commonly used diagnostic tool because the technology is cheaper and more readily available. Cost effectiveness was not examined in the study, but the costs of genome sequencing are falling rapidly. Currently, the best available cost runs around €3,500 ($4,000), but many genomic researchers say it could drop down to €1,500 ($1,700) until the end of this year.

Whole genome testing could become a more useful tool than exome sequencing in the long run because it provides more complete information. Genes account for less than 25% of the DNA in the genome. The remainder includes areas that control how genes are turned on and off as well as “junk” DNA whose function isn’t fully understood.

Get other aspects of the study in GenomeWeb.


Whole genome sequencing a complete island

Two days ago a groundbreaking study was published in Nature Genetics: Whole genome sequencing of 2,636 Icelanders and Genotyping of 104,220 Icelanders.

The advantage of using a small population like the Icelanders for this kind of study is that there are fewer rare variants, but sometimes also a higher occurance of some of these variants.

For the study, geenomic DNA was isolated from white blood cells and subsequent sequencing was performed on GAIIx and HiSeq instruments. The resulting reads were aligned to the human reference genome (NCBI Build 36 (hg18).

Gudbjartsson et al. then examined the data from different angles. For example, they looked for geographical dependencies for specific variants or how the data can be used to learn more about phenotypes and their underlying genomic pattern. But they also report an example “how rare variants […] can be used to analyze clinical problems”. (Gudbjartsson et. al)

Since every human being has a unique genomic pattern I think studies like this are of high importance to learn more about disease related genotypes. This will help to gain confidence in the results that we get from molecular diagnostic assays for disease treatment now and in the future.

Read the complete publication here.

Transcriptome assemblers put to the test

Next Generation Sequencing produces millions and billions of reads – and the interpretation of this reads rely on bioinformatic tools.

Especially for de novo assemblies of genomes or transcriptomes the result can vary dependent on the quality of the assembly.

In a recent publication Shorash Amin and his co-workers sequence the transcriptome of the non-model gastropod Nerita melanotragus with the Ion PGM. Afterwards they used different softwares and compared the quality different assemblies of the transcriptome (Amin et. al).

Oases, Trinity, Velvet and Geneious Pro, were the four de novo transcriptome assemblers that were used for this study. The assemblers were compared on different parameters like the length of the contigs, N50 statistics, BLAST and annotation success.

The longest contig was created with the Oasis assembler (1700 bp) and overall Trinity and Oasis delivered much better results than the de novo assembly of Ion PGM reads with Velvet or Geneious Pro.

Furthermore the mapping to a reference genome showed that Ion PGM transcriptome sequencing and subsequent de novo assembly with either Trinity or Oasis generates reliable and accurate results.

Read the complete publication here.

Don’t forget the controls!

Almost every day new data about the composition of microbiomes are published. Many of these studies analyse the human microbiome, but also environmental samples.

Today we have the ability to sequence microbiomes in much more depth than a couple of years ago. Looking deeper sheds light on an important point: Contamination! In the very interesting publication of Salter et al. they could show that contaminating DNA is present in DNA extraction kits and other lab reagents.

The researchers sent dilutions of pure cultures of Salmonella bongori to three different institutes for DNA extraction and PCR, followed by sequencing on Illumina MiSeq. While S. bongori was the only organism identified in the undiluted samples, contaminating bacteria increased in relative abundance with higher degrees of dilution, and finally became dominant after the fifth dilution.

They did a similar analysis performing shotgun metagenomics of a pure S. bongori culture. This time, they used four different DNA extraction kits. Again, they saw that contamination increased with the degree of dilution, with contamination being the predominant feature after the fourth dilution. Also, they could show that each kit gave a different bacterial profile.

They also report on a study on the nasopharyngeal microbiota of children, analyzed over 2 years. They could show that using 4 different DNA extraction kits over time led to the false conclusion that differences in the microbial spectrum were associated with age. When DNA extraction was repeated on original samples using a different kit lot, the OTUs previously identified as contaminants were no longer detected.

In conclusion, contamination affected both 16S and metagenomic shotgun sequencing projects and was especially critical for samples with low biomass. Salter et al. present a list of potential contaminating organisms, as well as recommendations on how to cope with this problem. One recommendation is very obvious, and very effective: use negative controls!

Altogether, we should be very careful in planning our experiments in order to deliver results instead of artefacts. Especially, we need to be very careful when interpreting the data!

Whole Genome Sequences Of World’s Oldest Living People Published

senior-asian-woman-100226669Researchers looked at the genome of some of the oldest living people. While they did not find a significant association with extreme longevity, the researchers published their genome findings. At least the data will be available as a resource for future researchers looking at the “genetic basis” of longevity.

There are 74 supercentenarians (110 years or older) alive worldwide, with 22 living in the United States. The authors of this study performed whole genome sequencing on 17 of them to explore the genetic basis underlying extreme human longevity.

“We were looking for a really simple explanation in a single gene,” said Stuart K. Kim, a Stanford geneticist and molecular biologist. “And we know now that it’s a lot more complicated, and it will take a lot more experiments and a lot more data from the genes of more supercentenarians to find out just what might account for their ages.”

From the limited sample size the researchers were not able to find protein-altering variants associated with extreme longevity, according to a study in PLOS ONE by Hinco Gierman from Stanford University and colleagues published November 12, 2014 . But they did find one supercentarian had a genetic variant related to a heart condition that had very little effect on his health considering he reached such and elderly age. The researchers noted that it is recommended by the American College of Medical Genetics and Genomics to report this instance as an incidental finding.

The whole genome sequences of all 17 supercentenarians are now available as a public resource so that they can be used to assist the discovery of the genetic basis of extreme longevity in future studies.


Compare to Large genome sequencing studies in the USA (posted August 26, 2014 )

Whose genome has been sequenced? Brassica napus

de-novo-sequencingBrassicas napus, also known as oilseed rape, was formed more than 7000 years ago by allopolyploidy (chromosome doubling from to Brassicas species). Of course the genome mutated further and so it is known today that during this evolution some genes were preserved and further “improved” (e.g. oil biosynthesis genes), whereas others were lost over the course of time (e.g. glusoinolate genes).

Chalhoub et. al now sequenced the genome, because it can help to “provide insights into allopolyploid evolution and its relationship with crop domestication and improvement” (Chalhoub et. al).

What was sequenced?

Young fresh leaves from the Brassica napus French homzygous winter line “Darmor-bzh“.

Sequencing strategy: Whole genome sequencing

  1. Libraries & Sequencing:
    Roche GS FLX: ~ 70 Million reads, Average Read length: ~ 368 bp, Genome coverage: 21.2 %
    Sanger BAC Seq: 141k reads, Read length: 650 bp; Genome coverage: 0.1%
    Illumina HiSeq:  ~375 Million reads, Read length: 36, 76, 108 and 150 bp, Genome coverage: 53.9%
  2. Data output: 44.146 contigs and 20.702 scaffolds
  3. Results: A final assembly of 849.7 Mb (using SOAP and Newbler) with 89% nongapped sequences.

After genome assembly the genome was mapped to other species (e.g. B. rapa and B. oleracea) and this helped to find several interesting genes and gene variation that help to understand the complete evolution better.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Whose Genome Has Been Sequenced? Belgica antarctica

de-novo-sequencingExtreme conditions require extreme actions. And this is what the midge Belgica antarctica has done. The midge lives exclusively in the Antarctic and in order to survive shrinked its genome to the smallest possible size. As of today, this is the smallest insect genome that has been sequenced.

Kelley et. al. now sequenced the genome of Belgica antarctica with the aim to learn more about how insects in general can adapt to the most extreme conditions.

What was sequenced?

Two fourth instar larva (Belgica antarctica) collected near Palmer Station, Antarctica.

Sequencing strategy: Whole genome sequencing & RNA-sequencing

  1. Libraries & Sequencing: 1 channel 2x 100 bp Illumina HiSeq 2000 (SG library (400 bp insert)) and one SMRT-cell of a 10 kb fragment library on PacBio RSII (P4 DNA Polymerase)
  2. Data output: 92 M paired-end reads from the shotgun sequencing with Illumina. These resulted in 5,422 contigs. Using the paired-end RNA-Seq data the number of contigs has been reduced to 5,064. Genome coverage with Illumina sequencing ~ 100x.
  3. Results: The total genome is ~ 99 Mbp.

For the PacBio sequencing a second larvae was used. But due to the low input of genomic DNA the PacBio data yielded only in a modest improvement in assembly. This underlines the need of a long-read sequencing technology with low input DNA material.

The de novo sequencing of the midge Belgica antarctica revealed that the smalll genome size is achieved by a reduction in repeats, TEs and intron size.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

The Common Marmoset as a Model Organism for the Study of Drug Metabolism

marmosetSeveral non-human primates including Macaca mulatta and Macaca fascicularis are well known as experimental animals in the field of neuroscience, stem cell research, drug toxicology, and other applications. The common marmoset (Callithrix jacchus) is also a non-human primate and is suitable as experimental animal because of the small size and highfecundity.

For developing a drug metabolism model, our collaborators and Eurofins Genomics (2014) performed transcriptome analysis of the common marmoset using in parallel long-read technology (Roche GS FLX+) and short-read sequencing (Illumina HiSeq 2000). This parallel NGS approach resulted in both, the identification and the quantitative analysis of transcripts and thus giving insight into gene expression during drug metabolism. Finally we obtained rich information about genes involved in drug-metabolism including 18 cytochrome P450- and 4 flavin-containing monooxygenase -like (FMO) genes, and their tissue-specific expression patterns.

The results of this study are the foundation for future studies not limited to drug metabolism & pharmacokinetics.

Improvement of PacBio ZMW loading procedure by DNA Origami?

Since the launch of the PacBio system in 2011, there has been a constant development and improvement of the methods involved (e.g. former posts here).

OrigamiStar-BlackPen.pngHowever, efficient loading of the Zero-Mode Waveguides (ZMWs) with polymerase molecules still remains a challenge. The ZMWs are tiny wells in which the actual sequencing reactions take place. Each SMRT cell consists of 150,000 ZMWs. However, with current methods, only about 1/3 of the ZMWs is actually useable after loading. The polymerase molecules are loaded onto the ZMWs by simple diffusion – resulting in ZMWs which carry one, more than one, or no polymerase molecule. As a consequence, each SMRT cell typically generates only approx. 50,000 reads per run.

A group of researchers from the Technical University of Braunschweig, Germany, has now used “DNA Origami” in order to efficiently place molecules into ZMWs.

DNA origami is a fascinating technique which uses the unique properties of DNA in order to create nanostructures by “folding” DNA into the required shapes. A ground-breaking article on DNA origami has been written by Paul Rothemund in 2006.

The researchers from Braunschweig have now created “nanoadapters” which exactly fit the size of the ZMWs. As a consequence, there cannot be more than one molecule in a ZMW. The nanoadapters carry a fluorescent dye on top and biotin molecules on the bottom side. These biotin molecules serve in fixing the nanoadapters to the bottom of the ZMW via neutravidin. In principle, the fluorescent dye could be replaced by a polymerase molecule. This approach greatly increased the loading efficiency to approx. 60 percent.

However, according to InSequence, the research group did not co-operate with PacBio for this project. In parallel, PacBio is working on other methods to increase the loading efficiency of their SMRT cells. But I am sure that there will be (and has to be) an improvement soon- no matter by which methods.
OrigamiStar-BlackPen” by Aldaron, a.k.a. Aldaron. – From JillsArt, posted with permission. Licensed under Attribution via Wikimedia Commons.