A major upgrade of SAMTools: CRAM format to reduce NGS data load

SAMTools, one of the most popular NGS sequence analysis tools has recently been upgraded by Computer scientists at the Wellcome Trust Sanger Institute. SAMTools is a set of utilities which allow the manipulation of alignments in the SAM/BAM format. SAM is the acronym for Sequence Alignment/Map format, whereas BAM is just the binary form of SAM. SAM can be seen as the worldwide standard for storing large nucleotide sequence alignments.

SAMTools 1.0, the revised version of the free program suite now allows researchers an improved handling of their sequencing data. Further to the existing SAM and BAM file formats, SAMTools now supports the new CRAM format. Basically, CRAM files are alignment files, just like BAM files – except that their size is reduced by 10 -30%. For better handling even greater compression – up to 100-fold – can be achieved in the “lossy” mode, that still preserves the most important information. The savings in storage that CRAM offers could be achieved by incorporating data compression techniques which were cooperatively developed by the Sanger Institute and the EMBL-European Bioinformatics Institute.

“This major rebuild of SAMTools reflects our commitment to supporting the global use of sequencing data,” says Dr Richard Durbin, Head of Computational Genomics at the Sanger Institute. “Genome science worldwide relies on fast and efficient data analysis and storage, and SAMTools 1.0 fulfills this need by supporting new sequencing and analysis technologies”. Dr. John Marshall from the Sanger Institute is highly optimistic that the widespread uptake of the new format will lead to lower data storage costs on a global scale (complete article).

I am curious on how the new format is going to be adapted by the genomic community. By the way did you know that SAMTools has been downloaded more than 225,000 times?