Question: Questions About How to Plot NOMe-seq Plot
0
21 months ago by
Ashley60
China
Ashley60 wrote:
``````#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os, sys, datetime
for i in range(len(sys.argv)):
"sys.argv[%d] = %s" % (i, sys.argv[i])
#print(len(sys.argv))
if len(sys.argv) != 6:
print('''python PyCalculateRatio.py ctcf NOMe-seq 'm:the x-axis>0 range' 'n:the No. of bin' counts.txt''')
exit(0)
fw= open(sys.argv[5], "w")
myDict_me = {}
fo_me = open(sys.argv[2], "r")
for line in fo_me:
line = line.rstrip()
s = line.split("\t")
key_me = s[0]+"_"+s[1]
myDict_me[key_me] = s[3]
m = int(sys.argv[3])
n = int(sys.argv[4])
delta = m / (n / 2)
result = []
time = []
pos = []
for i in range(0,n):
pos.append((-1)*m+i*delta)

for i in range(0,n):
result.append(0)
time.append(0)
fo_ctcf = open(sys.argv[1], "r")
for ctcf in fo_ctcf:
starttime = datetime.datetime.now()
ctcf_site = ctcf.strip().split()
for key in myDict_me.keys():
chrx = key.split("_")[0]
me_site = key.split("_")[1]
ratiome = myDict_me.get(key)
if((len(me_site) != "") and (len(ctcf_site)==3)):
if(chrx == ctcf_site[0]):
ratiome=float(ratiome)/100.0
x = int(me_site)
low = int(ctcf_site[1])
height = int(ctcf_site[2])
length = height - low
while(length > delta):
mid = (low+height)/2
if(x >= mid):
low = mid
else:
height = mid
mid = (low+height)/2
length = height - low
for j in range(0,n):
if((low >= (pos[j-1]+(int(ctcf_site[1])+int(ctcf_site[2]))/2)) and (low<(pos[j]+(int(ctcf_site[1])+int(ctcf_site[2]))/2))):
result[j] += float(ratiome)
time[j] += 1
else:
endtime = datetime.datetime.now()
print((endtime - starttime).seconds)
for i in range(0,n):
#    if isinstance(result,list):
if(float(time[i])==0):
fw.write(str((((-1.0)*m+delta*(i+1)+(-1.0)*m+delta*i)/2.0))+"\t"+str(result[i])+"\t"+str(time[i])+"\n")
#        fw.write(str((-1.0)*m+delta*i)+"\t"+str(1-result[i])+"\t"+str(time[i])+"\n")
else:
#        fw.write(str((-1.0)*m+delta*i)+"\t"+str(1-float(result[i])/float(time[i]))+"\t"+str(time[i])+"\n")
fw.write(str((((-1.0)*m+delta*(i+1)+(-1.0)*m+delta*i)/2.0))+"\t"+str(float(result[i])/float(time[i]))+"\t"+str(time[i])+"\n")
fw.close()
fo_ctcf.close()
fo_me.close()
``````

Could you give me some advice?

bs-seq chip-seq next-gen • 758 views
modified 21 months ago • written 21 months ago by Ashley60

I want to plot Fig. 3A in the paper "Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules'" Could anyone can help me? Any help would be great appreciate.

Thank you for joining Biostars. I have edited your post by highlighting code and pressing the `101 010` button. With the Biostars system, if you want to add subsequent information to your question, you should use the `Add comment` button (I have moved your other 2 posts to become comments).

Figure 3A can easily be drawn in R. All that you require is intensity (y axis) and distance from peak (x axis). Can you paste some of your input data? Somebody else may help with a Python solution.

So kind of you! Thank you so much! Input data for HCG methylation:

``````chr1    2104915 2104916 20      chr1    2104904 2106904
chr1    2104945 2104946 100     chr1    2104904 2106904
chr1    2104946 2104947 50      chr1    2104904 2106904
chr1    2104949 2104950 100     chr1    2104904 2106904
chr1    2104950 2104951 33.33   chr1    2104904 2106904
chr1    2104953 2104954 100     chr1    2104904 2106904
chr1    2104954 2104955 50      chr1    2104904 2106904
chr1    2104970 2104971 100     chr1    2104904 2106904
chr1    2104971 2104972 50      chr1    2104904 2106904
chr1    2104981 2104982 100     chr1    2104904 2106904
chr1    2104982 2104983 0       chr1    2104904 2106904
chr1    2104996 2104997 0       chr1    2104904 2106904
chr1    2105007 2105008 0       chr1    2104904 2106904
chr1    2105035 2105036 40      chr1    2104904 2106904
chr1    2105036 2105037 0       chr1    2104904 2106904
chr1    2105051 2105052 33.33   chr1    2104904 2106904
chr1    2105052 2105053 0       chr1    2104904 2106904
chr1    2105058 2105059 33.33   chr1    2104904 2106904
chr1    2105059 2105060 0       chr1    2104904 2106904
chr1    2105074 2105075 50      chr1    2104904 2106904
chr1    2105075 2105076 0       chr1    2104904 2106904
chr1    2105080 2105081 33.33   chr1    2104904 2106904
chr1    2105081 2105082 0       chr1    2104904 2106904
chr1    2105092 2105093 33.33   chr1    2104904 2106904
chr1    2105115 2105116 50      chr1    2104904 2106904
chr1    2105120 2105121 50      chr1    2104904 2106904
chr1    2105152 2105153 33.33   chr1    2104904 2106904
chr1    2105153 2105154 50      chr1    2104904 2106904
chr1    2105189 2105190 20      chr1    2104904 2106904
chr1    2105190 2105191 50      chr1    2104904 2106904
chr1    2105223 2105224 50      chr1    2104904 2106904
chr1    2105224 2105225 0       chr1    2104904 2106904
chr1    2105228 2105229 50      chr1    2104904 2106904
chr1    2105229 2105230 20      chr1    2104904 2106904
``````

Great! From where did you obtain the code? Is it from Scipy?

I highly appreciate your kind help. I wrote the code by myself.

And the following data is my output:

``````-990    0.0
-970    0.7154605330584559
-950    0.7154604587275448
-930    0.7154603725420188
-910    0.7154601245327105
-890    0.7154599622665291
-870    0.715460016423926
-850    0.7154600328418816
-830    0.715460122948553
-810    0.7154600694296549
-790    0.7154601793104122
-770    0.7154604874675868
-750    0.7154604381038174
-730    0.7154604585750587
-710    0.715460337675374
-690    0.7154604530500828
-670    0.7154603417560951
-650    0.7154603950752367
-630    0.7154605206410126
-610    0.7154606856893587
-590    0.7154607344842083
-570    0.7154609139750012
-550    0.7154611944122847
-530    0.7154610744060494
-510    0.7154609767048375
-490    0.7154609446312801
-470    0.7154612788209896
-450    0.7154613531160522
-430    0.7154614395256494
-410    0.7154615339744325
-390    0.7154615342128987
-370    0.715461609207974
-350    0.7154615357363073
-330    0.7154612506577616
-310    0.7154610017579309
-290    0.7154608738084999
-270    0.7154608615067723
-250    0.7154607259044948
-230    0.715460608329478
-210    0.7154606242643958
-190    0.715460781484274
-170    0.7154610061537292
-150    0.7154608138365816
-130    0.7154605750486169
-110    0.7154605716864685
-90 0.7154604129321028
-70 0.7154603255862175
-50 0.7154604887758309
-30 0.7154604117459252
-10 0.7154601891629296
10  0.7154604023299461
30  0.7154606294279887
50  0.715460503884257
70  0.715460593874396
90  0.715460606472705
110 0.715460586312209
130 0.7154605415314654
150 0.7154605865705582
170 0.715460558197969
190 0.7154606319920075
210 0.7154606179340408
230 0.7154608078230597
250 0.7154607819508018
270 0.7154606389770641
290 0.7154604161709814
310 0.7154606601749238
330 0.7154607472423898
350 0.7154605583733332
370 0.7154603133757407
390 0.7154603182697986
410 0.7154603235973366
430 0.7154604340706149
450 0.715460577760337
470 0.7154603556400093
490 0.7154604825389342
510 0.7154604975917872
530 0.7154604360009038
550 0.7154605144789659
570 0.7154603910747417
590 0.715460175844565
610 0.7154601012934135
630 0.7154602436876
650 0.7154602954457364
670 0.7154604920939924
690 0.7154608009271294
710 0.7154608056664519
730 0.7154607933149819
750 0.7154608803682699
770 0.7154609597171777
790 0.7154609352946621
810 0.7154612285746059
830 0.7154611214300414
850 0.7154611099772837
870 0.7154614036793734
890 0.7154615346065326
910 0.7154615626252193
930 0.7154615836878847
950 0.7154617997007456
970 0.7154617965938028
990 0.7154617790867362
``````

So, it looks like this (above) is the data required for the plot?

• x-axis = first column
• y-axis = second column

Is this the output of your Python script?

You can easily plot line graphs, like that in Fig 3A, using Matplotlib:

Thanks for your kind help and email. My R scripts for drawing this figure is as follows:

``````mydata1<-read.table("counts.txt")
x1<-mydata1[,1]
y1<-mydata1[,2]
plot(x1,y1,type="o",xlim=c(-1000,1000),ylim=c(0,1),col="green",main="NOMe-seq",xlab="Distance from CTCF (bp)",ylab=("HCG methylattion OR GCH protection"))
abline(v=10)
abline(v=-10)
labs <- c("me HCG", "nu 1-GCH")
legend("topright",legend = labs, cex = 0.8, lty = 1, lwd = 2, pch = c(1, 2), col = c("black", "green"), inset = 0.01, horiz = TRUE, box.col = "white")
``````

However, this figure of my is different from published figure. Could you give me some advice, any help would be highly appreciate.

Unfortunately, your figure will not look like Fig3A because all of your values are 0.7155. Your data indicates that methylation is roughly the same from point 0 for 1000 bp both up- and down-stream. Are you confident that your Python script is functioning as expected?

Thanks for your quick email. I think maybe it's ok. I didn't have the ability to figure it out. For same test dataset: My codes:

``````python PyCalculateRatio_HCG.py testCTCF.bed testHCG.bed '1000' '100' counts.txt
``````

input data for testCTCF.bed :

``````chr11   727182  729182
``````

input data for testHCG.bed:

``````chr11   727192  727193  100 chr11   727182  729182
chr11   727256  727257  100 chr11   727182  729182
chr11   727270  727271  100 chr11   727182  729182
chr11   727412  727413  50  chr11   727182  729182
chr11   727413  727414  20  chr11   727182  729182
chr11   727471  727472  75  chr11   727182  729182
chr11   727528  727529  66.67   chr11   727182  729182
chr11   727544  727545  100 chr11   727182  729182
chr11   727553  727554  50  chr11   727182  729182
chr11   727554  727555  50  chr11   727182  729182
chr11   727568  727569  0   chr11   727182  729182
chr11   727569  727570  50  chr11   727182  729182
chr11   727594  727595  50  chr11   727182  729182
chr11   727601  727602  50  chr11   727182  729182
chr11   727602  727603  75  chr11   727182  729182
chr11   727631  727632  33.33   chr11   727182  729182
chr11   727632  727633  66.67   chr11   727182  729182
chr11   727656  727657  100 chr11   727182  729182
chr11   727744  727745  50  chr11   727182  729182
chr11   727745  727746  66.67   chr11   727182  729182
chr11   727803  727804  80  chr11   727182  729182
chr11   727866  727867  100 chr11   727182  729182
chr11   727908  727909  50  chr11   727182  729182
chr11   727909  727910  100 chr11   727182  729182
chr11   727919  727920  33.33   chr11   727182  729182
chr11   727946  727947  50  chr11   727182  729182
chr11   727947  727948  0   chr11   727182  729182
chr11   727962  727963  75  chr11   727182  729182
chr11   727963  727964  100 chr11   727182  729182
chr11   727965  727966  100 chr11   727182  729182
chr11   728028  728029  100 chr11   727182  729182
chr11   728029  728030  33.33   chr11   727182  729182
chr11   728054  728055  0   chr11   727182  729182
chr11   728055  728056  0   chr11   727182  729182
chr11   728191  728192  100 chr11   727182  729182
chr11   728192  728193  25  chr11   727182  729182
chr11   728205  728206  25  chr11   727182  729182
chr11   728504  728505  0   chr11   727182  729182
chr11   728641  728642  50  chr11   727182  729182
chr11   728802  728803  75  chr11   727182  729182
chr11   728803  728804  100 chr11   727182  729182
chr11   728903  728904  100 chr11   727182  729182
chr11   728949  728950  100 chr11   727182  729182
chr11   729113  729114  0   chr11   727182  729182
chr11   729114  729115  100 chr11   727182  729182
chr11   729145  729146  0   chr11   727182  729182
chr11   729167  729168  66.67   chr11   727182  729182
``````

And my output:

``````-990.0  1.0 1
-970.0  0   0
-950.0  0   0
-930.0  1.0 1
-910.0  1.0 1
-890.0  0   0
-870.0  0   0
-850.0  0   0
-830.0  0   0
-810.0  0   0
-790.0  0   0
-770.0  0.35    2
-750.0  0   0
-730.0  0   0
-710.0  0.75    1
-690.0  0   0
-670.0  0   0
-650.0  0.6667  1
-630.0  0.666666666667  3
-610.0  0.25    2
-590.0  0.5 2
-570.0  0.75    1
-550.0  0.5 2
-530.0  1.0 1
-510.0  0   0
-490.0  0   0
-470.0  0   0
-450.0  0   0
-430.0  0.58335 2
-410.0  0   0
-390.0  0   0
-370.0  0.8 1
-350.0  0   0
-330.0  0   0
-310.0  1.0 1
-290.0  0   0
-270.0  0.6111  3
-250.0  0   0
-230.0  0.25    2
-210.0  0.916666666667  3
-190.0  0   0
-170.0  0   0
-150.0  0.66665 2
-130.0  0.0 2
-110.0  0   0
-90.0   0   0
-70.0   0   0
-50.0   0   0
-30.0   0   0
-10.0   0   0
10.0    0.625   2
-30.0   0   0
-10.0   0   0
10.0    0.625   2
30.0    0.25    1
50.0    0   0
70.0    0   0
90.0    0   0
110.0   0   0
130.0   0   0
150.0   0   0
170.0   0   0
190.0   0   0
210.0   0   0
230.0   0   0
250.0   0   0
270.0   0   0
290.0   0   0
310.0   0   0
330.0   0.0 1
350.0   0   0
370.0   0   0
390.0   0   0
410.0   0   0
430.0   0   0
450.0   0.5 1
470.0   0   0
490.0   0   0
510.0   0   0
530.0   0   0
550.0   0   0
570.0   0   0
590.0   0   0
610.0   0   0
630.0   0.875   2
650.0   0   0
670.0   0   0
690.0   0   0
710.0   0   0
730.0   1.0 1
750.0   0   0
770.0   1.0 1
790.0   0   0
810.0   0   0
830.0   0   0
850.0   0   0
870.0   0   0
890.0   0   0
910.0   0   0
930.0   0.5 2
950.0   0   0
970.0   0.0 1
990.0   0.6667  1
``````

Why is this result different from the first one that you showed? This data (above) may look more like Fig3A.

Input data for CTCF binding sites:

``````chr1    1306613 1308613
chr1    2104904 2106904
chr1    2312363 2314363
chr1    2360679 2362679
chr1    2737736 2739736
chr1    2837778 2839778
chr1    3235698 3237698
chr1    3340442 3342442
chr1    3406962 3408962
chr1    3640156 3642156
chr1    4711814 4713814
chr1    5805041 5807041
chr1    5975233 5977233
chr1    6100695 6102695
chr1    6112165 6114165
chr1    6245040 6247040
chr1    6305956 6307956
chr1    6402673 6404673
chr1    6407177 6409177
chr1    6473553 6475553
chr1    6497261 6499261
chr1    6534742 6536742
chr1    6582865 6584865
chr1    6705758 6707758
chr1    6779234 6781234
chr1    6853402 6855402
chr1    6949129 6951129
chr1    7101963 7103963
chr1    7706475 7708475
chr1    7728710 7730710
chr1    7812064 7814064
chr1    7962003 7964003
chr1    8090447 8092447
chr1    8373541 8375541
chr1    8408231 8410231
chr1    8785762 8787762
chr1    8908143 8910143
chr1    9047285 9049285
chr1    9169996 9171996
chr1    9348249 9350249
chr1    9843786 9845786
chr1    9926716 9928716
chr1    9934106 9936106
chr1    10553885        10555885
chr1    10554288        10556288
chr1    10566674        10568674
chr1    10969158        10971158
chr1    11001198        11003198
chr1    11300940        11302940
chr1    11701665        11703665
chr1    11779027        11781027
chr1    11953180        11955180:
``````

My boss gave me a month to repeat this picture. Any help about this figure would be great appreciate! Thank you so much !

1

If my boss gave me a warning like that, I would walk straight to the Human Resources department and / or the Head of Department (or her/his secretary) to report the incident, and provide any recorded evidence. Threats like that constitute bullying / abuse and I am aware that these things occur all across the World. However, I do not know how these processes function in your own university / institution.

I think my boss is motiving me to do things by putting pressure on me.

The reason why they are different is that for above result, I only selected some part of all data.

For HCG methylation, I just selected top 47 rows in chr11 for testing. And for CTCF binding result, I only selected 1 row in chr11 for testing.

I would not call it motivation... That is an archaic way of supervising. Ruling by fear is a cheap and cowardly way of trying to gain control over a person or people.

I neither understand why you need a complex Python script. Assuming that you have some normalised data in BAM format, all that you need to do is extract the signal intensity per base position up- and down-stream of your target CTCF region. The plot itself is then easy, as it is just a line graph.

I also think this is a very simple work. But when I drew this plot, the result is different from published data. Anyway, thanks for your email and great advice. I found aaRon R package can drew the same type of this figure. I think I would change the GCH, WCG, TSS, HMEC, and '3000' to GCH, HCG, CTCF summit, IMR90, and '1000'. Do you think this would be better? Thanks a lot. And Looking forward to your reply.

``````methylationBiPlot(GCH, WCG, TSS, samples["HMEC", ], up=3000, down=3000, addN=FALSE)
``````

Thanks you so much!

I want to close this thread. It's Okay now. human genome version hg18 and hg19

The thread is 'closed' in the sense that an answer has been provided. Are you implying that you want to delete the entire thread?

Thanks for your quick reply. I don't want to delete the entire thread. I just want to close this thread because the answer is provided. Thanks a lot.

In 1 day, the thread will disappear from the main page and become 'buried' among the 1000s of old threads. By having an answer provided, it will neither be 'bumped' back to the main page by the system.

1

1
21 months ago by
Ashley60
China
Ashley60 wrote:

The reason why I didn't this figure is the human genome version. NOMe-seq hg19 CTCF ChIP-seq hg18 When I use liftover to convert the version, the figure is same.

Best wishes, Ashley

I have moved this to an answer and accepted it for you, just to close off this thread. Good luck.

1

Thank you very much! Itâ€™s okay. So kind of you. Close this thread. Best regards, Ashley