Question: what is the conventional apporach to calculate depth of coverage?
0
gravatar for serpalma.v
5 weeks ago by
serpalma.v20
Germany
serpalma.v20 wrote:

Hello

for WGS, when a given depth of coverage is recomended, for example: 30x for variant calling, where does that quantitiy derive from?:

A/ from raw reads (number of reads * read length / genome size)

B/ from raw reads minus the duplicate percentage. ( (number of reads-%duplication) * read length / genome size)

C/ from raw alignments after filters are applied (i.e. map quality and base quality >= 20)

D/ from processed alignments (i.e. after base quality score recalibration) after filters are applied (i.e. map quality and base quality >= 20)

I have been searching through several threads in this forum and in the literature, but I find no consensus on depth of coverage calculation. I am mostly confused about wheather it takes into account the reads before or after alignment.

Is there any?

Thanks in advance

depth snp coverage dnaseq wgs • 162 views
ADD COMMENTlink modified 5 weeks ago by finswimmer6.1k • written 5 weeks ago by serpalma.v20
1
gravatar for finswimmer
5 weeks ago by
finswimmer6.1k
Germany
finswimmer6.1k wrote:

Hello serpalma.v ,

as Wouter says there is no consensus. And this makes it quite hard (not to say impossible) to compare coverage declaration from different sources. I constantly have this discussion with other people.

I would be glad if the terms read depth and coverage won't be mixed.

For me read depth is just the raw read count on a given position in my reference.

The definition for coverage I prefer is: Number of reads from different molecules with defined quality, on a given position in my reference. This definition imply that overlapping paired reads are count as 1, duplicates are removed and that I need to tell what I mean by "defined quality" e.g. that reads under a certain treshold for the mapping quality are not counted.

Furthermore you have to take care if the given coverage values are meant as an average over a given region or if they count for each position.

fin swimmer

ADD COMMENTlink written 5 weeks ago by finswimmer6.1k

In which case then read depth would be equivalent to option A and coverage would be equivalent to option D.

would you agree?

ADD REPLYlink written 5 weeks ago by serpalma.v20
0
gravatar for WouterDeCoster
5 weeks ago by
Belgium
WouterDeCoster32k wrote:

I don't think there is a consensus, but I would take the reads after alignment, without filtering.

Note that due to amplification biases (GC content mainly) the coverage is also quite variable across the genome.

Note also that these cutoffs are quite arbitrary and not always based on a statistical analysis of sensitivity

ADD COMMENTlink written 5 weeks ago by WouterDeCoster32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2061 users visited in the last hour