Bismark .cov coverage underrepresentation
1
0
Entering edit mode
9.1 years ago

Hello,

I have been using bismark for a while and I noticed that the .cov file which is produced by their "methylation extractor" add-ons, has underrepresented counts of methylated/unmethylated bases for many regions.

The two last columns of this file correspond to count of methylated, and count of unmethylated C. The software claims that the sum of these two columns is the coverage of the position.

I used bedtools genomecov (taking care that overlapping paired end reads are only counted once) and I found the while the bismark coverage highly correlates with bedtools', there are many instances bedtools having a higher coverage than bismark's.

I looked at many of these regions myself and found that the bedtools coverage is in fact correct.

In some of the cases, if I count only one of the strands, I get the same coverage as bismark but for many other examples I cannot understand what is the cause of this.

Anybody else has experienced a similar problem and might know where I might be wrong?

Thanks

Mehran


Answer:

Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.

However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.

bismark bisulphite coverage wgbs • 4.4k views
ADD COMMENT
0
Entering edit mode

If you can post a small reproducible example then this can be confirmed at a bug. You might want to email Feli Krueger, the author. He's quite responsive.

ADD REPLY
0
Entering edit mode

I actually emailed Felix but he is on vacation for a while. I thought maybe other people had experienced something similar.

ADD REPLY
1
Entering edit mode

Too bad. If you can post a small example dataset that reproduces this then I can have a look. Having said that, I can also recommend just using PileOMeth, which I wrote along with brentp. That won't give the same coverage metric as bedtools, but it's doing some additional quality filtering, so that's expected.

ADD REPLY
0
Entering edit mode

That is very kind of you. I am not able to find your email though. Could you please send me an email:

<removed>

ADD REPLY
1
Entering edit mode
9.0 years ago

Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.

However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.

ADD COMMENT

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6