Whose genome has been sequenced? Latimera chalumnae

de-novo-sequencingThe third de novo sequenced genome in our series Whose genome has been sequenced? is the “living fossil” Latimera chalumnae.

The most difficult part for this de novo genome sequencing approach was to get enough starting material. The authors even reported that their first approach was to use the Sanger technology, but is simply was not enough DNA available. Therefore they had to wait until the next generation sequencing techniques were stable enough to risk the sequencing (BioTechniques). Here are the sequencing facts of this study (Amemiya et al.):

What was sequenced?

A blood sample from an adult African coelacanth

De novo sequencing strategy:

  1. Libraries: shotgun library 61-fold coverage; 3 kb jumping library – 88-fold coverage, 40 kb fosmid library 1-fold coverage
  2. Illumina HiSeq 2000 (paired-end module)
  3. De novo genome assembly using the software ALLPATHS-LG
  4. RNA sequencing

RNA-Seq sequencing strategy:

  1. 4 cDNA libraries (1x mRNA-Seq library, 3x strand specific dUTP libraries from brain, gonad/kidney, gut/liver tissue) were sequenced using a HiSeq
  2. Data output: mRNA-Seq library ~ 210M paired-end reads;  dUTP libarires ~ 3-4 Gb of sequence/tissue
  3. Assembly was performed using Trinity

The genome sequencing helped to understand the possibility of this prehistoric fish to thrive on dry land and the phenotype that is so similar to 300 million year old fossils (BioTechniques).

Read the complete publication here.

Earlier published genomes: Goat genome (Capra hircus); Chickpea plant (Cicer arietinum)

Samba In The World Of NGS

sambaToday I was reading a publication about sequencing error profiles in Ion torrent PGM data, when I came upon a detail in the PGM sequencing workflow that I find funny and interesting at the same time and that I want to share with you.

You may know that the sequencing method of the Ion Torrent PGM is quite similar to the sequencing method of the Roche 454 devices. In both technologies beads that hold the clonally amplified template with appropriate sequencing adaptors are loaded onto a plate with millions of wells. The loading is performed in a way that ensures that most wells are loaded with a single bead (the size of the wells do not allow two beads per well). In a next step dNTPs are flowed over the surface in a predetermined order with only one type of nucleotide at a time. Washing steps occur before the next dNTP is flowed over the surface. The way the incorporation of the nucleotide is measured represents the substantial difference between both technologies:

With the Roche 454 technology an enzymatic cascade follows the polymerization event that finally generates pyrophosphate and light. The light intensity is proportional to the number of nucleotides that were incorporated (if any). The light is detected by the camera of the system.

In contrast, the Ion torrent PGM is measuring pH rather than light to detect incorporation events. A single proton is released for every dNTP incorporated during the flow, which changes the net pH value in the respective well and a ionic sensor measures the pH change.

The Roche system (as well as the first generation of the PGM) cycles the 4 dNTPs in a step-wise fashion. They simply repeat the sequence TACG over and over. With the second generation PGM these 4 base cycles have been changed to 32 base cycles (TACGTACGTCTGAGCATCGATCGATGTACAGC), called the Samba sequence. The sequence starts with the same 4-nucleotide repeats, but after 2 such patterns some nucleotides are repeated in a period shorter than four. According to Bragg et al. this modification was implemented to improve the synchrony of clonal templates which facilitates a more accurate base calling. Unfortunately the Samba sequence is not optimized for read length as the original sequence was. It remains to be seen if Ion Torrent (now owned by Thermo Fisher) will make further modifications in the Samba sequence in order to balance the accuracy and the read length of the system.

Whose genome has been sequenced? Cicer arietinum

de-novo-sequencingWith this new bi-weekly series we would like to highlight some if not all genomes that have been sequenced in the last 6 to 12 months. And at this point of time I am still uncertain if the diversity of organisms and species will be the “eye-opener” or the different technologies and strategies that have been used…

We started this series off in January where we reported about the de novo sequencing of the domestic goat Capra hircus.

Today I would like to report about a plant genome, the Cicer arietinum:

According to the GenomeWeb article this de novo genome sequencing approach is only the 3rd one for crop legume plants. For me that is kind of astonishing since breeding and optimisation of crop is already done since years. Maybe this is due to the huge genomes of plants that outperform animal genomes by far. For our chickpea plant with 740 million base pairs we talk about a medium size plant genome. But let’s focus on the sequencing approach for now (Varshney et. al):

What was sequenced?

De novo sequencing of one reference chickpea plant and re-sequencing of 90 cultivated & wild chickpea lines from 10 different countries

Sequencing strategy:

  1. De novo genome sequencing on HiSeq 2000 (paired-end module) of 1 genome with 11 shotgun and mate-pair libraries (insert sizes: ~ 170; 500; 800; 2,000; 5,000; 10,000; 20,000 bp) and BAC end sequencing
    Data output: 153.01 Gb; after filtering & correction steps only 87.65 Gb data were used for de novo assembly
  2. Re-sequencing of genomes
    • Whole genome re-sequencing on 29 varieties using Illumina 100 bp paired-end sequencing on HiSeq 2000
    • RAD-sequencing of 61 genotypes on HiSeq 2000 (48x ApeKI; 24x HindIII)

According to D. Cook “the sequencing of the chickpea provides genetic information that will help plant breeders develop highly productive chickpea varieties that can better tolerate drought and resist disease — traits that are particularly important in light of the threat of global climate change”. (Davis Enterprise).

Read the complete publication here.

High-Throughput Sequencing Machines By Platform

The High-Throughput Sequencing map by James Hadfield (Cancer Research UK, Cambridge) gives us a very interesting overview about sequencing activities around the world. We ran a survey to find out if your favourite machines correspond with the platforms listed by James in his overview.

Here are the results: Your personal favourites are nearly a perfect match with platforms in the genome centers worldwide. Great match!

survey

 

The British Ash Tree Genome Project

ash_treeMid of March we wrote about the British Ash Tree Genome Project.

Yesterday, the School of Biological and Chemical Sciences, Queen Mary University of London launched a website explaining in detail their amazing project: Find there general information, data and tools. Further, an interview on the project from the Radio 4 Today programme on 21/12/12 can be heard here. More details on the project can be heard on NERC’s 5/2/13 Planet Earth podcast.

Visit http://ashgenome.org

800 bp Read Length For Amplicon Sequencing Is Not Science Fiction

Amplicon sequencing with Roche GS JuniorAbout a year ago my colleguage Regina reported about the new possibilities of using the MiSeq system for amplicon sequencing (16S Amplicon Experiments: Which Platform to Choose?). Now, one year later still everything is true about the advantages of amplicon sequencing using the MiSeq (e.g. lower cost/base).

The main advantage of the Roche system are the long reads that are highly valuable for some applications. By ligating appropriate sequencing adaptors we can currently deliver average read length of up to 700 bp when using the GS FLX+ pipeline. Further improvements regarding the read length can be expected with the launch of a new amplicon pipeline from Roche for the Roche GS FLX+ system (planned for summer 2013).

And beside the ultra long reads on the GS FLX+ system there are still some advantages of amplicon sequencing using the GS Junior system compared to other technologies:

+ short turnaround time (starting from 5-10 working days)

+ competitive pricing

+ moderate to long reads (350 – 450 bp)

+ sufficient data output for all projects with a medium size of samples (e.g. up to 24)

What is your preferred next generation sequencing technology for amplicon sequencing? Take part in our current poll.

Survival Of The Fittest – NGS Library Prep Methods

276_7698_RT8-Vorschau30 years of PCR in various applications has revolutionised molecular biology. But PCR also has its drawbacks. One of them is the amplification of AT- or GC-rich DNA fragments. Naturally, researchers are often interested in sequencing and studying genomes with high GC or high AT content, like S. aureus with a AT content of 67% or Streptomyces coelicolor with a GC content of 72%.
But more and more NGS kit providers try to circumvent PCR in the library prep. Ashley Yeager has summarised the current status of PCR-free library preps including a comprehensive overview of the pro’s and con’s of both methods (BioTechniques).

Summarising the findings from Mrs. Yeager there is no clear champion in sight:

Library prep by using PCR methods
+ well-known lab procedure & good sequencing efficiency
- difficulties in amplifying GC- / AT-rich regions -> sequencing is biased


PCR-free library prep

+ good sequence read distribution & a more even genome coverage
- huge amounts of starting material needed & sequencing reaction is less efficient

Read the complete article under BioTechniques.

Synthetic DNA – Data Storage For Eternity?

In the April issue of the journal Spektrum der Wissenschaft I found a very interesting article from Jan Dönges about data storage of information with the help of synthetic DNA (oligonucleotides). He describes the work of Ewan Birney and Nick Goldman from the European Bioinformatics Institute (EBI) in Hinxton, UK who have developed a strategy that allows coding data in strings of A, C, G and T nucleotides (Nature 494, 77-80, February 7 2013). They coded all sonnets of Shakespeare, a photo of the institute, the original paper of Watson and crick about the structure of DNA, an audio recording of the speech of Martin Luther King “I have a dream” and file with coding instructions; all together 739 kilobyte of information. They ordered the oligos and sequenced them on an Illumina HiSeq 2000. They received a text file of the letters A, C, G and T that could be converted into the original data. The complete code and sequence can be found here.

From sequencing experiments like the mammoth or the Neanderthal man we know that DNA is at least 10,000 years stable, longer than any other data storage. In addition it is extremely dense. With 1 gram of DNA it is possible to code more than 2 petabyte (1015 byte), or 2.3 million gigabyte. The volume of a coffee cup would be sufficient to code 100 million hours of high resolution videos. It is to be expected that the technology could even be improved in the future as long as mankind still is interested in DNA. The cost for the experiment was quite high compared to other storage media like tapes, HDD or DVDs. However, already after 600 years of making consecutive security copies of tapes the cost is compensated. So, if we want to conserve the knowledge of mankind for very long periods and make sure that it survives possible major disasters in the future, this seems to be a reasonable strategy

Genome Sequencing Analysis of Ash Tree – Supported by £2.4 Million

ash_treeTo conduct genome sequencing and analysis of Ash (Fraxinus excelsior), researchers in the UK received £2.4 million ($3.6 million / €2.8 million). The major aim of this project is to increase the understanding of the wide spreading fungal tree disease, which is widespread in northern Europe and has already been found at more than 300 sites across the UK (see http://www.forestry.gov.uk/chalara). Those fungi attack ash tress but some tress resists those attacks.

For this reason a lot of samples of the ash dieback fungus will be sequenced and – funded by an urgency grant from the Natural Environment Research Council – the complete genome sequence of Ash is aimed to be available by August.

Sequencing of the approximately 900 Mb plant genome will be performed applying the latest hybrid de novo sequencing strategy, recently proven to deliver excellent scaffolding and assembly results. This new golden standard in de novo sequencing employs a combination of Roche/454 FLX++ long read technology (software version 2.8 with read lengths up to 1,100 bp) and Illumina HiSeq 2000/2500 high throughput sequencing with several ultra-accurate long jumping distance libraries (LJD of 3kb, 8kb, 20kb and 40kb), supplemented by sequencing of Illumina shotgun libraries with different fragment sizes.

With the sequenced ash tree genome the researchers hope to hold clues to how some of the trees (2% are able to defend the disease) are able to resist attack, and knowledge about the genetic differences between resistant and non-resistant trees. This knowledge could be used to develop trees that can’t be infected.

Project leader, Dr. Richard Buggs from Queen Mary’s School of Biological and Chemical Sciences: “Sequencing the ash genome is a foundational step towards discovering the genetic basis of resistance to ash dieback – the future of ash trees in Britain may depend on this”.

Read more about that exciting project at GenomeWeb about the general project and at Eurofins MWG Operon about the genome sequencing.

The Galaxy of the Genomics Virtual Lab

The Genomics Virtual Lab (GVL) project – using the computing resources from the NeCTAR Research Cloud – is an Australian Government project conducted as part of the “Super Science” initiative. It is developing infrastructure supporting genome informatics research.

Their Galaxy-based NGS and HTS tutorials are really excellent:

You will love the precise explanations, the hands-on demonstration and the additional material like screenshots and in-depth information!