Question: Coverage For De Novo Snp Detection In Microbes?
gravatar for Paul_Muller
8.8 years ago by
Northeastern University
Paul_Muller70 wrote:


I'm interested in searching for mutations associated with an altered phenotype in a bacteria via resequencing (probably Illumina). This particular bacterial genome is ~7Mb and there is a reference available. I figure I should aim for single nucleotide resolution to be able to detect nearly 100% of SNPs. My question is, how can I determine the amount of coverage necessary to be able to detect 100% of SNPs? I found a reference from Holt et al. 2009 in Bioinformatics where they state they can detect 80% of SNPs at 45X coverage (

A paper that spells it out would be best, but if that isn't available do you think I could use Lander-Waterman and the error rate associated with Illumina to estimate the necessary coverage?

I welcome opinions and other considerations.

Thank you

bacteria coverage snp • 1.8k views
ADD COMMENTlink written 8.8 years ago by Paul_Muller70
gravatar for Pablo
8.8 years ago by
Pablo1.9k wrote:

In the referenced paper they used pooled sampling and a GA_I sequencer.

If you use non-pooled samples and a HiSeq I'm pretty sure you should achieve quite a good coverage (probably not 100%, but may be you can reach 99%). A simple exercise is to get you reference genome and see if every 100bp (or whatever read length you'll use) is uniquely mappable. This is quite easy to do since your reference is only 7Mb and it will give you an idea of what read length you need to map all read (and if you need pair-end).

ADD COMMENTlink written 8.8 years ago by Pablo1.9k

Thank you Pablo for the suggestion. I think I'll chop up the reference genome in to various lengths, resample with replacement up to various levels of coverage, and map the pieces back and see what fraction are unique.

ADD REPLYlink written 8.8 years ago by Paul_Muller70
gravatar for ALchEmiXt
8.8 years ago by
The Netherlands
ALchEmiXt1.9k wrote:

Personally I think you will never be able to generate enough coverage to get 100% of the scores (thats why 80% is often referenced). The sequencing technique by itself has already difficulty enough to get through the hard/repeatable regions anyway.

Did you also consider structural variations! For that paired-end libraries should be beneficial.

ADD COMMENTlink written 8.8 years ago by ALchEmiXt1.9k

ALchEmiXt, I think structural variations like CNVs might be too difficult to detect without paired ends and reads on the shorter end of the spectrum. We havent made out final choice of platform and I'm prepping for "worse case." But, if we can use 100 bp paired ends, then I'll definately look for CNVs.

ADD REPLYlink written 8.8 years ago by Paul_Muller70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 866 users visited in the last hour