Question

what is the conventional apporach to calculate depth of coverage?

0

Entering edit mode

7.2 years ago

serpalma.v ▴ 80

Hello

for WGS, when a given depth of coverage is recomended, for example: 30x for variant calling, where does that quantitiy derive from?:

A/ from raw reads (number of reads * read length / genome size)

B/ from raw reads minus the duplicate percentage. ( (number of reads-%duplication) * read length / genome size)

C/ from raw alignments after filters are applied (i.e. map quality and base quality >= 20)

D/ from processed alignments (i.e. after base quality score recalibration) after filters are applied (i.e. map quality and base quality >= 20)

I have been searching through several threads in this forum and in the literature, but I find no consensus on depth of coverage calculation. I am mostly confused about wheather it takes into account the reads before or after alignment.

Is there any?

Thanks in advance

coverage SNP WGS dnaseq depth • 1.9k views

ADD COMMENT • link updated 7.2 years ago by finswimmer 16k • written 7.2 years ago by serpalma.v ▴ 80

score 1 · Answer 1 · 2018-09-06

Hello serpalma.v ,

as Wouter says there is no consensus. And this makes it quite hard (not to say impossible) to compare coverage declaration from different sources. I constantly have this discussion with other people.

I would be glad if the terms read depth and coverage won't be mixed.

For me read depth is just the raw read count on a given position in my reference.

The definition for coverage I prefer is: Number of reads from different molecules with defined quality, on a given position in my reference. This definition imply that overlapping paired reads are count as 1, duplicates are removed and that I need to tell what I mean by "defined quality" e.g. that reads under a certain treshold for the mapping quality are not counted.

Furthermore you have to take care if the given coverage values are meant as an average over a given region or if they count for each position.

fin swimmer

score 0 · Answer 2 · 2018-09-06

I don't think there is a consensus, but I would take the reads after alignment, without filtering.

Note that due to amplification biases (GC content mainly) the coverage is also quite variable across the genome.

Note also that these cutoffs are quite arbitrary and not always based on a statistical analysis of sensitivity