Question: Bismark .cov coverage underrepresentation
0
gravatar for mehran.karimzade
4.0 years ago by
Canada
mehran.karimzade150 wrote:

Hello,

I have been using bismark for a while and I noticed that the .cov file which is produced by their "methylation extractor" add-ons, has underrepresented counts of methylated/unmethylated bases for many regions.

The two last columns of this file correponds to count of methylated, and count of unmethylated C. The software claims that the sum of these two columns is the coverage of the position.

I used bedtools genomecov (taking care that overlapping paired end reads are only counted once) and I found the while the bismark coverage highly correlates with bedtools', there are many instances bedtools having a higher coverage than bismark's.

 

I looked at many of these regions myself and found that the bedtools coverage is in fact correct.

In some of the cases, if I count only one of the strands, I get the same coverage as bismark but for many other examples I cannot understand what is the cause of this.

Anybody else has experienced a similar problem and might know where I might be wrong?

 

Thanks

Mehran

 

Answer:

Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.

However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.

coverage bismark wgbs bisulphite • 2.2k views
ADD COMMENTlink modified 3.9 years ago • written 4.0 years ago by mehran.karimzade150

If you can post a small reproducible example then this can be confirmed at a bug. You might want to email Feli Krueger, the author. He's quite responsive.

ADD REPLYlink written 4.0 years ago by Devon Ryan88k

I actually emailed Felix but he is on vacation for a while. I thought maybe other people had experienced something similar.

ADD REPLYlink written 4.0 years ago by mehran.karimzade150
1

Too bad. If you can post a small example dataset that reproduces this then I can have a look. Having said that, I can also recommend just using PileOMeth, which I wrote along with brentp. That won't give the same coverage metric as bedtools, but it's doing some additional quality filtering, so that's expected.

ADD REPLYlink written 4.0 years ago by Devon Ryan88k

That is very kind of you. I am not able to find your email though. Could you please send me an email:

<removed>

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by mehran.karimzade150
1
gravatar for mehran.karimzade
3.9 years ago by
Canada
mehran.karimzade150 wrote:

Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.

However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.

ADD COMMENTlink written 3.9 years ago by mehran.karimzade150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour