Question: Possible bug for bismark when methylation calling ?
1
gravatar for hxlei613
2.8 years ago by
hxlei61380
Bangladesh
hxlei61380 wrote:

Hi, I'm using bismark for bs-seq mapping and methylation calling. I take a glance at the coverage file generated by bismark2bedGraph. However, the result seems strange. Take CpG site of chrM for example (because chrM is the smallest chromosome). The coverage file contains sites that are not CpG sites. Here is a part of one CpG coverage file.


chrM 96 96 0 0 95

chrM 97 97 0 0 92

chrM 100 100 0 0 1

chrM 105 105 0 0 92

chrM 106 106 0 0 93

... ...

chrM 2003 2003 0 0 170

chrM 2004 2004 0 0 116

chrM 2107 2107 0 0 1

chrM 2204 2204 0 0 163

chrM 2205 2205 0 0 156

... ...

chrM 2826 2826 0 0 125

chrM 2827 2827 0 0 147

chrM 2833 2833 0 0 1

chrM 2845 2845 0 0 130

chrM 2846 2846 0 0 142


It's strange that site 100,2107,2833 are mixed in the CpG coverage file. And on the same sites of the reference, these three sites are not CpG sites. (here I call it false CpG.)

Then I count total CpG sites on chrM. There are 439 CpG sites on the plus strand, whick means 878 CpG on both strands. Here is the code.

input = open ("chrM",'r').read()
cg = 0
total = 0
print 'Num\tStart\tEnd'
for i in range(0, len(input)):
    if i <= 16570:
    substr=input[i:i+2]
    if substr.lower() == 'cg' :
        total += 1
        print '%s\t%s\t%s'%(total,i+1,i+2)

So number of lines in chrM CpG coverage file should always be equal to or less than 878. But the truth is, for two bs-seq data, one is 885, another is 900. Both of them contains false CpG. And the sites are different between them. For example, site 100 in the first file doesn't appear in the second one. Another, all of the false CpG only have one unmethylated reads.

Here are my commands (%s represents parameters):

bismark --multicore %s --output_dir %s %s -1 %s -2 %s
bismark_methylation_extractor -p --no_overlap --comprehensive --multicore %s --ignore %s --ignore_r2 %s --ignore_3prime %s --ignore_3prime_r2 %s --output  %s %s
bismark2bedGraph --dir %s --remove_spaces --CX --ample_memory --output CpG %s

First, I use --comprehensive to produce CpG,CHH and CHG context output files. Then I use bismark2bedGraph and --CX to process CpG,CHH and CHG context output files separately. In this way I can get CpG,CHH and CHG coverage files separately.

In addition, my version is v0.16.1. I haven't test it using the newest one.

Is this a bug or am I using the wrong commands ? Thank you very much for sharing your idea.

bismark bs-seq methylation • 1.5k views
ADD COMMENTlink modified 2.8 years ago by Santosh Anand4.9k • written 2.8 years ago by hxlei61380
1

Which Genomic Version? hg19 or hg38?

ADD REPLYlink written 2.8 years ago by Shicheng Guo7.7k

I suspect this is an odd bug in bismark. What happens if you use a different too (e.g., PileOMeth)?

ADD REPLYlink written 2.8 years ago by Devon Ryan91k
3
gravatar for Santosh Anand
2.8 years ago by
Santosh Anand4.9k
Santosh Anand4.9k wrote:

There is --CX in command line, which enables Cytosine in any context. From Bismark manual:

--CX/--CX_context

The sorted bedGraph output file contains information on every single cytosine that was covered in the experiment irrespective of its sequence context. This applies to both forward and reverse strands. Please be aware that this option may generate large temporary and output files and may take a long time to sort (up to many hours). Default: OFF (i.e. Default = CpG context only).

All the offending positions (100, 2107, 2833) have 'G', which means 'C' on the reverse strand. Also note that the coverage file is not in bed-format (it has 1-based start and end).

As the methylation percentage is per se not informative of the actual read coverage of detected methylated or unmethylated reads at a position, bismark2bedGraph also writes out a coverage file (using 1-based genomic genomic coordinates) that features two additional columns: <chromosome> <start position=""> <end position=""> <methylation percentage=""> <count methylated=""> <count unmethylated="">

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Santosh Anand4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1515 users visited in the last hour