Question: What Coverage For Genome Re-Sequencing By Illumina ?
1
gravatar for helene.badouin
7.6 years ago by
University Paris South
helene.badouin20 wrote:

Hello,

I was wondering was coverage you need to do genome re-sequencing in illumina (Illumina HighSeq 2000) ?

I was told 100x, which seems high, but I read that people often seem to use a 20-30x coverage.

Moreover, is it necessary to have a higher coverage to look for intra-population selective sweeps (from individual samples), than to investigate the genomic architecture of differenciation between sister species ?

Thank you by advance for you answer.

illumina mapping coverage • 4.3k views
ADD COMMENTlink written 7.6 years ago by helene.badouin20

In which species do you intend to work? You know what's the quality of their genome?

ADD REPLYlink written 7.6 years ago by Biojl1.7k

It's a phytopathogen fungi genome, with a high GC-rate, so I think we're going to re-sequence several individuals at a high coverage at first (100x). Then we'll do some sampling, to see how much we can lower the coverage for further experiments without decreasing sensitivity.

ADD REPLYlink written 7.6 years ago by helene.badouin20
3
gravatar for Zev.Kronenberg
7.6 years ago by
United States
Zev.Kronenberg11k wrote:

Just an example of whole genome coverage:

enter image description here

Rather than giving you a hard number here are two articles that answer your questions.

Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Crawford & Lazzro 2012.

Low-coverage sequencing: Implications for design of complex trait association studies. Li et al 2011.

Whole genome depth modeling:

Exome dist:

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Zev.Kronenberg11k

Thank you for the link. The first one in particular is very relevant for my interests (non human populations with small sample sizes).

ADD REPLYlink written 7.6 years ago by helene.badouin20
3
gravatar for Jeremy Leipzig
7.6 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

Coverage should follow a Poisson distribution, so if your mean coverage is 30X, you will fall below 20X about 3.5% of the time. In theory to get 30X at 99% of locations you will need a mean of 45X coverage.

Unfortunately the genome does not respect this distribution and you will often see deserts and hotspots with thousands of reads, although this is largely a mappability issue.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Jeremy Leipzig19k

Yup, it is naughty data. I Often see that a negative binomial is a better fit.

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by Zev.Kronenberg11k

using the negative binomial, what mean coverage is necessary to have 99% of bases covered at 30X?

ADD REPLYlink written 7.6 years ago by Jeremy Leipzig19k
1

I guess I should have been more clear: this was for exome data. I also added a plot for WG data in my original post.

Exome depth histograms often look more like:

n<-100000 hist(rpois(n,rgamma(n,2,0.0333)))

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by Zev.Kronenberg11k
2
gravatar for Lee Katz
7.6 years ago by
Lee Katz3.0k
Atlanta, GA
Lee Katz3.0k wrote:

With bacteria, we are aiming for something like 50x. For high quality SNPs, we aim for 100x so that even the lower-coverage bases will have good coverage.

ADD COMMENTlink written 7.6 years ago by Lee Katz3.0k
2
gravatar for swbarnes2
7.6 years ago by
swbarnes27.8k
United States
swbarnes27.8k wrote:

It also depends what you are looking for. For homozygous SNPs, 30x average will do pretty well. For heterozygous, or mixed SNPs, 50x is more like it.

ADD COMMENTlink written 7.6 years ago by swbarnes27.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 815 users visited in the last hour