I have been using bismark for a while and I noticed that the .cov file which is produced by their "methylation extractor" add-ons, has underrepresented counts of methylated/unmethylated bases for many regions.
The two last columns of this file correponds to count of methylated, and count of unmethylated C. The software claims that the sum of these two columns is the coverage of the position.
I used bedtools genomecov (taking care that overlapping paired end reads are only counted once) and I found the while the bismark coverage highly correlates with bedtools', there are many instances bedtools having a higher coverage than bismark's.
I looked at many of these regions myself and found that the bedtools coverage is in fact correct.
In some of the cases, if I count only one of the strands, I get the same coverage as bismark but for many other examples I cannot understand what is the cause of this.
Anybody else has experienced a similar problem and might know where I might be wrong?
Thanks to Devon Ryan, I found the reason for this. The C and MethylC counts must be specific to only one of the strands at each genomic position. For example if at positive strand is a C, then the minus strand is a G and only the C coverage should be counted. Bedtools gives the total coverage.
However it seems that there are some additional filterinngs by Bismark that I was not able to figure out.