what is the conventional apporach to calculate depth of coverage?
Entering edit mode
4.1 years ago
serpalma.v ▴ 70


for WGS, when a given depth of coverage is recomended, for example: 30x for variant calling, where does that quantitiy derive from?:

A/ from raw reads (number of reads * read length / genome size)

B/ from raw reads minus the duplicate percentage. ( (number of reads-%duplication) * read length / genome size)

C/ from raw alignments after filters are applied (i.e. map quality and base quality >= 20)

D/ from processed alignments (i.e. after base quality score recalibration) after filters are applied (i.e. map quality and base quality >= 20)

I have been searching through several threads in this forum and in the literature, but I find no consensus on depth of coverage calculation. I am mostly confused about wheather it takes into account the reads before or after alignment.

Is there any?

Thanks in advance

coverage SNP WGS dnaseq depth • 935 views
Entering edit mode
4.1 years ago

Hello serpalma.v ,

as Wouter says there is no consensus. And this makes it quite hard (not to say impossible) to compare coverage declaration from different sources. I constantly have this discussion with other people.

I would be glad if the terms read depth and coverage won't be mixed.

For me read depth is just the raw read count on a given position in my reference.

The definition for coverage I prefer is: Number of reads from different molecules with defined quality, on a given position in my reference. This definition imply that overlapping paired reads are count as 1, duplicates are removed and that I need to tell what I mean by "defined quality" e.g. that reads under a certain treshold for the mapping quality are not counted.

Furthermore you have to take care if the given coverage values are meant as an average over a given region or if they count for each position.

fin swimmer

Entering edit mode

In which case then read depth would be equivalent to option A and coverage would be equivalent to option D.

would you agree?

Entering edit mode
4.1 years ago

I don't think there is a consensus, but I would take the reads after alignment, without filtering.

Note that due to amplification biases (GC content mainly) the coverage is also quite variable across the genome.

Note also that these cutoffs are quite arbitrary and not always based on a statistical analysis of sensitivity


Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6