Over the last years, exome sequencing has become a standard application. Every day, huge amounts of data are generated which need to be interpreted. However: Are we sure that our analysis is always showing us the complete picture?
Based on experience, coverage can significantly vary over the entire exome. For this reason, not only the average on-target coverage should be considered, but also the local coverage at a particular site of interest. Otherwise, important information may get lost.
Researchers of the University of Edinburgh and the Wellcome Trust Sanger Institute have carried out a study which was recently published in BMC Bioinformatics. They analysed how sequencing depth relates to sensitivity of SNV detection. They used a set of 30 captured exomes, which had been sequenced to a high depth. As basis for the analyses, they selected a set of verified “gold standard” SNVs for each sample. Then they generated different randomly selected subsets of each data set. In the next step, they called SNVs on the full data sets and the downsampled sets.
From those studies, they estimated that in order to detect at least 95% of the heterozygous SNVs, the local coverage at a given site of interest must be at least 13-fold, while a 3-fold coverage would be sufficient to detect a homozygous SNV. On the other hand, an average on-target coverage of 20fold would result in 5-15% of the heterozygous and 1-4% of the homozygous SNVs to be missed.
They concluded that one does not necessarily have to go for excessively high coverage for exome sequencing, but one should consider how likely a polymorphism could remain undetected.
Actually, the same considerations should be made when looking at whole genome data.
The group has developed software to help researchers check their data. It can be applied to determine the local and overall SNV detection sensitivity of a given data set. The software is available for free download.
What is your experience? Share your expert knowledge with us!