Transcriptome assemblers put to the test

Next Generation Sequencing produces millions and billions of reads – and the interpretation of this reads rely on bioinformatic tools.

Especially for de novo assemblies of genomes or transcriptomes the result can vary dependent on the quality of the assembly.

In a recent publication Shorash Amin and his co-workers sequence the transcriptome of the non-model gastropod Nerita melanotragus with the Ion PGM. Afterwards they used different softwares and compared the quality different assemblies of the transcriptome (Amin et. al).

Oases, Trinity, Velvet and Geneious Pro, were the four de novo transcriptome assemblers that were used for this study. The assemblers were compared on different parameters like the length of the contigs, N50 statistics, BLAST and annotation success.

The longest contig was created with the Oasis assembler (1700 bp) and overall Trinity and Oasis delivered much better results than the de novo assembly of Ion PGM reads with Velvet or Geneious Pro.

Furthermore the mapping to a reference genome showed that Ion PGM transcriptome sequencing and subsequent de novo assembly with either Trinity or Oasis generates reliable and accurate results.

Read the complete publication here.

FacebookTwitterGoogle+Share

Different QM/QA Levels for Genomics Analyses

quality1High quality standards are essential for non-clinical QC testing. When we obtained GLP certification, people ask me about relevant QM/QA levels for genomics analyses. This is what I tell them:

ISO 9001 – the basis

The ISO 9001 standard is a global quality management standard that favours process orientation, customer orientation, satisfaction and continuous improvement. ISO 9001 provides the basis for a quality management system ensuring that all processes are documented and defined in SOPs. In an ISO 9001 compliant laboratory responsibilities are clearly defined, all work environment and infrastructure is suited for its intended purpose. Equipment and facilities are qualified and maintained and measuring and testing equipment requires regular calibration. Also the staff is qualified and well trained and the training is recorded. Supplier management and purchase are controlled processes. Non-conforming work and failures are corrected and documented. Processes for corrective and preventive actions are implemented, as well as a proper complaint management. In an ISO 9001 QM system all business processes are monitored (e.g. by internal and external audits). Customer feedback and all data obtained are analysed on a regular basis. These data and information are the basis for continuous improvement of the ISO 9001 QM system.

ISO 17025 – assures technical valid results

The ISO 17025 is derived from ISO 9001. With an ISO 17025 accreditation a laboratory demonstrates its technical competence and the ability to generate technical valid and correct results. In addition to the ISO 9001 standard the participation in external proficiency testings is mandatory. Furthermore, the documentation of the lab procedures is a lot more detailed and involves dedicated protocolling procedures.

GLP – the gold standard to conduct non-clinical safety studies

The GLP (Good Laboratory Practice) standard adds on top of that a framework in which laboratories non-clinical safety studies are planned, performed, monitored, recorded, reported and archived. GLP helps to assure regulatory authorities that submitted data are a true reflection of the results, obtained during the study and can therefore be relied upon when making risk/safety assessments. In addition to the requirements of ISO 9001 and ISO 17025, GLP involves the nomination of a study director and dedicated trained personnel for GLP compliant processes. A study will involve always the creation of a study plan which will be signed by the study director. All processes applied in the study need to be described within the study plan. Any deviations to the study plan will lead to an amendment of the study plan. After completion of the analyses the study director generates a final report signed by Study Director and QA/QM. It also includes a signed QA-and GLP compliance statement. Each study is audited by quality assurance staff. Furthermore, there needs to be restricted laboratory access and restricted access to relevant data as well as dedicated archiving procedures (GLP archive) for all GLP documents and raw data..

GCP – similar to GLP with focus on clinical studies and patient safety

The GCP (Good Clinical Practice) standard is very similar to the GLP standard; however it is relevant only for clinical studies and has thus a focus on patient safety and reporting of adverse drug events. In a study that involves GCP compliance it has to be assured that only such things are analyzed that a study patient has consented to.

Feel free to write a comment for further clarification. I am looking forward to get in contact with you.

Cheers, Katrin

Don’t forget the controls!

Almost every day new data about the composition of microbiomes are published. Many of these studies analyse the human microbiome, but also environmental samples.

Today we have the ability to sequence microbiomes in much more depth than a couple of years ago. Looking deeper sheds light on an important point: Contamination! In the very interesting publication of Salter et al. they could show that contaminating DNA is present in DNA extraction kits and other lab reagents.

The researchers sent dilutions of pure cultures of Salmonella bongori to three different institutes for DNA extraction and PCR, followed by sequencing on Illumina MiSeq. While S. bongori was the only organism identified in the undiluted samples, contaminating bacteria increased in relative abundance with higher degrees of dilution, and finally became dominant after the fifth dilution.

They did a similar analysis performing shotgun metagenomics of a pure S. bongori culture. This time, they used four different DNA extraction kits. Again, they saw that contamination increased with the degree of dilution, with contamination being the predominant feature after the fourth dilution. Also, they could show that each kit gave a different bacterial profile.

They also report on a study on the nasopharyngeal microbiota of children, analyzed over 2 years. They could show that using 4 different DNA extraction kits over time led to the false conclusion that differences in the microbial spectrum were associated with age. When DNA extraction was repeated on original samples using a different kit lot, the OTUs previously identified as contaminants were no longer detected.

In conclusion, contamination affected both 16S and metagenomic shotgun sequencing projects and was especially critical for samples with low biomass. Salter et al. present a list of potential contaminating organisms, as well as recommendations on how to cope with this problem. One recommendation is very obvious, and very effective: use negative controls!

Altogether, we should be very careful in planning our experiments in order to deliver results instead of artefacts. Especially, we need to be very careful when interpreting the data!

Prepare NGS for clinical use

Molecular diagnostics (MDx) is to my opinion the most sensitive application for all kinds of molecular biology techniques like PCR, Sanger Sequencing or Next Generation Sequencing. Today, NGS is still a niche application and needs further improvement to be a common tool for MDx. One thing that is lacking is the standardisation of NGS for clinical use.

The NGS Working Group, established by the Friends of Cancer Research worked out a master plan (The ASCO Post), with critical points that need to be addressed to use NGS more commonly:

1. Define a regulatory pathway for cancer panels (a selection of multimarker gene assays) intended to identify actionable oncogenic alterations (those with supporting data to create risk-benefit assessment of treatment choice) that allow flexibility in the appropriate FDA medical device pathway—for instance, one based on risk classification of different panel components depending on the specific marker.

2. Approaches to validation studies should be based on the types of alterations measured by the assay rather than on every alteration individually.

3. Determine the contents of a cancer panel by classifying potential markers based on current utility in clinical care and clinical trials and peer-reviewed publications, as well as recognized clinical guidelines. Draw upon various sources to determine the recommended marker set for an actionable cancer panel.

4. Promote standardization of cancer panels through development and use of a common set of samples to ensure reproducibility on each platform.

5. Establish a framework for determining an appropriate reference method rather than relying on any single method for all studies.

Get more information to each proposal here.

Whole Genome Sequences Of World’s Oldest Living People Published

senior-asian-woman-100226669Researchers looked at the genome of some of the oldest living people. While they did not find a significant association with extreme longevity, the researchers published their genome findings. At least the data will be available as a resource for future researchers looking at the “genetic basis” of longevity.

There are 74 supercentenarians (110 years or older) alive worldwide, with 22 living in the United States. The authors of this study performed whole genome sequencing on 17 of them to explore the genetic basis underlying extreme human longevity.

“We were looking for a really simple explanation in a single gene,” said Stuart K. Kim, a Stanford geneticist and molecular biologist. “And we know now that it’s a lot more complicated, and it will take a lot more experiments and a lot more data from the genes of more supercentenarians to find out just what might account for their ages.”

From the limited sample size the researchers were not able to find protein-altering variants associated with extreme longevity, according to a study in PLOS ONE by Hinco Gierman from Stanford University and colleagues published November 12, 2014 . But they did find one supercentarian had a genetic variant related to a heart condition that had very little effect on his health considering he reached such and elderly age. The researchers noted that it is recommended by the American College of Medical Genetics and Genomics to report this instance as an incidental finding.

The whole genome sequences of all 17 supercentenarians are now available as a public resource so that they can be used to assist the discovery of the genetic basis of extreme longevity in future studies.

 

Compare to Large genome sequencing studies in the USA (posted August 26, 2014 )

Whose genome has been sequenced? Brassica napus

de-novo-sequencingBrassicas napus, also known as oilseed rape, was formed more than 7000 years ago by allopolyploidy (chromosome doubling from to Brassicas species). Of course the genome mutated further and so it is known today that during this evolution some genes were preserved and further “improved” (e.g. oil biosynthesis genes), whereas others were lost over the course of time (e.g. glusoinolate genes).

Chalhoub et. al now sequenced the genome, because it can help to “provide insights into allopolyploid evolution and its relationship with crop domestication and improvement” (Chalhoub et. al).

What was sequenced?

Young fresh leaves from the Brassica napus French homzygous winter line “Darmor-bzh“.

Sequencing strategy: Whole genome sequencing

  1. Libraries & Sequencing:
    Roche GS FLX: ~ 70 Million reads, Average Read length: ~ 368 bp, Genome coverage: 21.2 %
    Sanger BAC Seq: 141k reads, Read length: 650 bp; Genome coverage: 0.1%
    Illumina HiSeq:  ~375 Million reads, Read length: 36, 76, 108 and 150 bp, Genome coverage: 53.9%
  2. Data output: 44.146 contigs and 20.702 scaffolds
  3. Results: A final assembly of 849.7 Mb (using SOAP and Newbler) with 89% nongapped sequences.

After genome assembly the genome was mapped to other species (e.g. B. rapa and B. oleracea) and this helped to find several interesting genes and gene variation that help to understand the complete evolution better.

Read the complete publication here.

Whose Genome Has Been Sequenced? – Recent posts:

Think Big: The UK 100,000 Genome Project

In late 2012 the 100,000 genome project was launched. UK Prime Minister David Cameron announced a new initiative led by the National Health Service to sequence the genomes of up to 100,000 people and to use their genomic information in treatment and studies of cancer and other diseases. The government set aside 100 million GBP for this project.

hiseq-x-tenGenomics England which is heading the project now named 10 firms that have been selected to for the assessment of the next phase of the project. The companies are Congenica; Diploid; NantOmics; Genomics Ltd.; Illumina; Qiagen; Lockheed Martin; NextCode Health; Omicia; and Personalis.

As part of the recently completed stage, Genomics England in February sent out a questionnaire to 28 participants in relation to 10 cancer/normal samples and 15 rare disease trio samples.

Illumina is partnering as well and will contribute with the ultra-high throughput sequencing platform HiSeq XTM Ten.

What will be the next step? Sequencing everyone?

More Updates: Illumina & IonTorrent

Quarter 4 of 2014 seems to be another exciting one for Next Generation Sequencing. Beside the chemistry update for PacBio RSII also Illumina and IonTorrent / ThermoFisher announced two major improvements / achievements:

  • Chemistry update for the Illumina HiSeq X Ten and the HiSeq 2500 Rapid Run
    The new v2 reagent kit for the HiSeq X Ten supports a PCR-free sample preparation kit, which eliminates amplification during the library preparation. So far only sample preparation kits with PCR were possible, which sometimes results in a lower quality of challeging genomic regions.
    The new v2 reagent kit for the HiSeq 2500 enables users to sequence 2x 250 bp and the new chemistry therefore delivers up to 300 Gbp of data in only 60 hours. (Press Release)
    To my opinion Illumina proves once more that NGS is highly dynamic and that their continous update for existing systems is the key for their success (the latest financial report confirms that Q3 of 2014 with a growth of 10% is the strongest since 2011 for Illumina (Fierce Medical Devices)).
  • IonTorrent goes diagnostic
    The Ion PGM Dx System is now also CE-Marked for in vitro diagnostic (IVD) use in Europe. Thermo Fisher Scientific believes that the CE-mark “will enable European clinical laboratories to more easily […] implement new […] diagnostic assays” (Press Release).
    In September they announced already that the PGM is now listed with the U.S. FDA as a Class II Medical Device.
    To my opinion the clearance for diagnostic use in Europe as well as in the U.S. will further strengthen the position of the Ion PGM in clinical laboratories.

PacBio launches new chemistry and software

In a press release Pacific Biosciences announced the latest enhancement for the PacBio RS II single molecule DNA sequencer. The latest release of the polymerase 6 and chemistry 4 (P6 – C4) version in combination with improved software enhances the performance and output of the platform by 45%. The average read length is now 10,000 – 15,000 bases and up to 40,000 bases for the longest reads. Depending on the nature of the DNA a single SMRT cell will deliver 500 million to 1 billion bases.

The new chemistry will replace the current P5 – C3 chemistry and is recommended for all SMRT sequencing applications.

This new release also includes improvements to the SMRT Analysis software suite for long amplicon analysis and the Iso-Seq™ method. Together with chemistry enhancements, these advances boost accuracy, speed up analysis, and support sequencing of multiplexed amplicons of different sizes.

m4s0n501

Do you want to share your biggest secret?

people_09Should we all get our genome sequenced? And share the information? Just today I read two articles in GenomeWeb regarding human genome sequencing. With, to my opinion, opposite views regarding sharing information from human genomes.

The first article is about the 23andMe project: Here two different groups of people said, that with the functionality “check for close relatives” box they ended up in real crisis in their family. In one case the parents divorced since the close relative box showed that the husband had already a child with another women (prior this marriage). And in the other case a girl found out that she has a brother, whom her mother has giving up for adoption.

So for me this is a clear indicator that simply sharing the genome information might really cause more problems than it can solve.

Exactly the opposite is asked for by George Church. From his point of view for eradicating diseases, creating unlimited energy sources and so on a public access to as many genomes (human and non-human) as possible is a prerequisite.

And I think I could agree to that partially, if we talk about bacteria or plant genomes. But I think we are not ready for a wide sharing of human genome information.

What also became clear to me is that we are not a lot further, than 2 years ago (Genomics – A Curse Or A Blessing?).