Question: Questions About How to Plot NOMe-seq Plot
0
gravatar for Ashley
17 days ago by
Ashley40
China
Ashley40 wrote:

https://genome.cshlp.org/content/22/12/2497.long paper 3A:

#!/usr/bin/python
# -*- coding: UTF-8 -*-  
import os, sys, datetime
for i in range(len(sys.argv)):
    "sys.argv[%d] = %s" % (i, sys.argv[i])
#print(len(sys.argv))
if len(sys.argv) != 6:
    print('''python PyCalculateRatio.py ctcf NOMe-seq 'm:the x-axis>0 range' 'n:the No. of bin' counts.txt''')
    exit(0)
fw= open(sys.argv[5], "w")
myDict_me = {}
fo_me = open(sys.argv[2], "r")
for line in fo_me:
    line = line.rstrip()
    s = line.split("\t")
    key_me = s[0]+"_"+s[1]
    myDict_me[key_me] = s[3]
m = int(sys.argv[3])
n = int(sys.argv[4])
delta = m / (n / 2)
result = []
time = []
pos = []
for i in range(0,n):
    pos.append((-1)*m+i*delta)

for i in range(0,n):
    result.append(0)
    time.append(0)
fo_ctcf = open(sys.argv[1], "r")
for ctcf in fo_ctcf:
    starttime = datetime.datetime.now()
    ctcf_site = ctcf.strip().split()
    for key in myDict_me.keys():
        chrx = key.split("_")[0]
        me_site = key.split("_")[1]
        ratiome = myDict_me.get(key)
        if((len(me_site) != "") and (len(ctcf_site)==3)):
            if(chrx == ctcf_site[0]):
                ratiome=float(ratiome)/100.0
                x = int(me_site)
                low = int(ctcf_site[1])
                height = int(ctcf_site[2])
                length = height - low
                while(length > delta):
                    mid = (low+height)/2 
                    if(x >= mid):
                        low = mid
                    else:
                        height = mid
                    mid = (low+height)/2
                    length = height - low
                for j in range(0,n):
                    if((low >= (pos[j-1]+(int(ctcf_site[1])+int(ctcf_site[2]))/2)) and (low<(pos[j]+(int(ctcf_site[1])+int(ctcf_site[2]))/2))):
                        result[j] += float(ratiome)
                        time[j] += 1
        else:
            ctcf = fo_ctcf.readline()
    endtime = datetime.datetime.now()
print((endtime - starttime).seconds)
for i in range(0,n):
#    if isinstance(result,list):
    if(float(time[i])==0):
        fw.write(str((((-1.0)*m+delta*(i+1)+(-1.0)*m+delta*i)/2.0))+"\t"+str(result[i])+"\t"+str(time[i])+"\n")
#        fw.write(str((-1.0)*m+delta*i)+"\t"+str(1-result[i])+"\t"+str(time[i])+"\n")
    else:
#        fw.write(str((-1.0)*m+delta*i)+"\t"+str(1-float(result[i])/float(time[i]))+"\t"+str(time[i])+"\n")
        fw.write(str((((-1.0)*m+delta*(i+1)+(-1.0)*m+delta*i)/2.0))+"\t"+str(float(result[i])/float(time[i]))+"\t"+str(time[i])+"\n")
fw.close()
fo_ctcf.close()
fo_me.close()

Could you give me some advice?

bs-seq chip-seq next-gen • 281 views
ADD COMMENTlink modified 16 days ago • written 17 days ago by Ashley40

I want to plot Fig. 3A in the paper "Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules'" Could anyone can help me? Any help would be great appreciate.

ADD REPLYlink written 17 days ago by Ashley40

Thank you for joining Biostars. I have edited your post by highlighting code and pressing the 101 010 button. With the Biostars system, if you want to add subsequent information to your question, you should use the Add comment button (I have moved your other 2 posts to become comments).

Figure 3A can easily be drawn in R. All that you require is intensity (y axis) and distance from peak (x axis). Can you paste some of your input data? Somebody else may help with a Python solution.

ADD REPLYlink modified 17 days ago • written 17 days ago by Kevin Blighe35k

So kind of you! Thank you so much! Input data for HCG methylation:

chr1    2104915 2104916 20      chr1    2104904 2106904
chr1    2104945 2104946 100     chr1    2104904 2106904
chr1    2104946 2104947 50      chr1    2104904 2106904
chr1    2104949 2104950 100     chr1    2104904 2106904
chr1    2104950 2104951 33.33   chr1    2104904 2106904
chr1    2104953 2104954 100     chr1    2104904 2106904
chr1    2104954 2104955 50      chr1    2104904 2106904
chr1    2104970 2104971 100     chr1    2104904 2106904
chr1    2104971 2104972 50      chr1    2104904 2106904
chr1    2104981 2104982 100     chr1    2104904 2106904
chr1    2104982 2104983 0       chr1    2104904 2106904
chr1    2104996 2104997 0       chr1    2104904 2106904
chr1    2105007 2105008 0       chr1    2104904 2106904
chr1    2105035 2105036 40      chr1    2104904 2106904
chr1    2105036 2105037 0       chr1    2104904 2106904
chr1    2105051 2105052 33.33   chr1    2104904 2106904
chr1    2105052 2105053 0       chr1    2104904 2106904
chr1    2105058 2105059 33.33   chr1    2104904 2106904
chr1    2105059 2105060 0       chr1    2104904 2106904
chr1    2105074 2105075 50      chr1    2104904 2106904
chr1    2105075 2105076 0       chr1    2104904 2106904
chr1    2105080 2105081 33.33   chr1    2104904 2106904
chr1    2105081 2105082 0       chr1    2104904 2106904
chr1    2105092 2105093 33.33   chr1    2104904 2106904
chr1    2105115 2105116 50      chr1    2104904 2106904
chr1    2105120 2105121 50      chr1    2104904 2106904
chr1    2105152 2105153 33.33   chr1    2104904 2106904
chr1    2105153 2105154 50      chr1    2104904 2106904
chr1    2105189 2105190 20      chr1    2104904 2106904
chr1    2105190 2105191 50      chr1    2104904 2106904
chr1    2105223 2105224 50      chr1    2104904 2106904
chr1    2105224 2105225 0       chr1    2104904 2106904
chr1    2105228 2105229 50      chr1    2104904 2106904
chr1    2105229 2105230 20      chr1    2104904 2106904
ADD REPLYlink modified 16 days ago • written 17 days ago by Ashley40

Great! From where did you obtain the code? Is it from Scipy?

ADD REPLYlink written 16 days ago by Kevin Blighe35k

I highly appreciate your kind help. I wrote the code by myself.

ADD REPLYlink written 16 days ago by Ashley40

And the following data is my output:

-990    0.0
-970    0.7154605330584559
-950    0.7154604587275448
-930    0.7154603725420188
-910    0.7154601245327105
-890    0.7154599622665291
-870    0.715460016423926
-850    0.7154600328418816
-830    0.715460122948553
-810    0.7154600694296549
-790    0.7154601793104122
-770    0.7154604874675868
-750    0.7154604381038174
-730    0.7154604585750587
-710    0.715460337675374
-690    0.7154604530500828
-670    0.7154603417560951
-650    0.7154603950752367
-630    0.7154605206410126
-610    0.7154606856893587
-590    0.7154607344842083
-570    0.7154609139750012
-550    0.7154611944122847
-530    0.7154610744060494
-510    0.7154609767048375
-490    0.7154609446312801
-470    0.7154612788209896
-450    0.7154613531160522
-430    0.7154614395256494
-410    0.7154615339744325
-390    0.7154615342128987
-370    0.715461609207974
-350    0.7154615357363073
-330    0.7154612506577616
-310    0.7154610017579309
-290    0.7154608738084999
-270    0.7154608615067723
-250    0.7154607259044948
-230    0.715460608329478
-210    0.7154606242643958
-190    0.715460781484274
-170    0.7154610061537292
-150    0.7154608138365816
-130    0.7154605750486169
-110    0.7154605716864685
-90 0.7154604129321028
-70 0.7154603255862175
-50 0.7154604887758309
-30 0.7154604117459252
-10 0.7154601891629296
10  0.7154604023299461
30  0.7154606294279887
50  0.715460503884257
70  0.715460593874396
90  0.715460606472705
110 0.715460586312209
130 0.7154605415314654
150 0.7154605865705582
170 0.715460558197969
190 0.7154606319920075
210 0.7154606179340408
230 0.7154608078230597
250 0.7154607819508018
270 0.7154606389770641
290 0.7154604161709814
310 0.7154606601749238
330 0.7154607472423898
350 0.7154605583733332
370 0.7154603133757407
390 0.7154603182697986
410 0.7154603235973366
430 0.7154604340706149
450 0.715460577760337
470 0.7154603556400093
490 0.7154604825389342
510 0.7154604975917872
530 0.7154604360009038
550 0.7154605144789659
570 0.7154603910747417
590 0.715460175844565
610 0.7154601012934135
630 0.7154602436876
650 0.7154602954457364
670 0.7154604920939924
690 0.7154608009271294
710 0.7154608056664519
730 0.7154607933149819
750 0.7154608803682699
770 0.7154609597171777
790 0.7154609352946621
810 0.7154612285746059
830 0.7154611214300414
850 0.7154611099772837
870 0.7154614036793734
890 0.7154615346065326
910 0.7154615626252193
930 0.7154615836878847
950 0.7154617997007456
970 0.7154617965938028
990 0.7154617790867362
ADD REPLYlink written 16 days ago by Ashley40

So, it looks like this (above) is the data required for the plot?

  • x-axis = first column
  • y-axis = second column

Is this the output of your Python script?

ADD REPLYlink written 16 days ago by Kevin Blighe35k

You can easily plot line graphs, like that in Fig 3A, using Matplotlib:

ADD REPLYlink written 16 days ago by Kevin Blighe35k

Thanks for your kind help and email. My R scripts for drawing this figure is as follows:

mydata1<-read.table("counts.txt")
x1<-mydata1[,1]
y1<-mydata1[,2]
plot(x1,y1,type="o",xlim=c(-1000,1000),ylim=c(0,1),col="green",main="NOMe-seq",xlab="Distance from CTCF (bp)",ylab=("HCG methylattion OR GCH protection"))
abline(v=10)
abline(v=-10)
labs <- c("me HCG", "nu 1-GCH")
legend("topright",legend = labs, cex = 0.8, lty = 1, lwd = 2, pch = c(1, 2), col = c("black", "green"), inset = 0.01, horiz = TRUE, box.col = "white")

However, this figure of my is different from published figure. Could you give me some advice, any help would be highly appreciate.

ADD REPLYlink written 16 days ago by Ashley40

my figure:enter image description here

https://www.dropbox.com/s/ffd6i4scajqp7lx/mydata.png?dl=0enter image description here

ADD REPLYlink modified 16 days ago • written 16 days ago by Ashley40

Unfortunately, your figure will not look like Fig3A because all of your values are 0.7155. Your data indicates that methylation is roughly the same from point 0 for 1000 bp both up- and down-stream. Are you confident that your Python script is functioning as expected?

ADD REPLYlink written 16 days ago by Kevin Blighe35k

Thanks for your quick email. I think maybe it's ok. I didn't have the ability to figure it out. For same test dataset: My codes:

python PyCalculateRatio_HCG.py testCTCF.bed testHCG.bed '1000' '100' counts.txt

input data for testCTCF.bed :

chr11   727182  729182

input data for testHCG.bed:

chr11   727192  727193  100 chr11   727182  729182
chr11   727256  727257  100 chr11   727182  729182
chr11   727270  727271  100 chr11   727182  729182
chr11   727412  727413  50  chr11   727182  729182
chr11   727413  727414  20  chr11   727182  729182
chr11   727471  727472  75  chr11   727182  729182
chr11   727528  727529  66.67   chr11   727182  729182
chr11   727544  727545  100 chr11   727182  729182
chr11   727553  727554  50  chr11   727182  729182
chr11   727554  727555  50  chr11   727182  729182
chr11   727568  727569  0   chr11   727182  729182
chr11   727569  727570  50  chr11   727182  729182
chr11   727594  727595  50  chr11   727182  729182
chr11   727601  727602  50  chr11   727182  729182
chr11   727602  727603  75  chr11   727182  729182
chr11   727631  727632  33.33   chr11   727182  729182
chr11   727632  727633  66.67   chr11   727182  729182
chr11   727656  727657  100 chr11   727182  729182
chr11   727744  727745  50  chr11   727182  729182
chr11   727745  727746  66.67   chr11   727182  729182
chr11   727803  727804  80  chr11   727182  729182
chr11   727866  727867  100 chr11   727182  729182
chr11   727908  727909  50  chr11   727182  729182
chr11   727909  727910  100 chr11   727182  729182
chr11   727919  727920  33.33   chr11   727182  729182
chr11   727946  727947  50  chr11   727182  729182
chr11   727947  727948  0   chr11   727182  729182
chr11   727962  727963  75  chr11   727182  729182
chr11   727963  727964  100 chr11   727182  729182
chr11   727965  727966  100 chr11   727182  729182
chr11   728028  728029  100 chr11   727182  729182
chr11   728029  728030  33.33   chr11   727182  729182
chr11   728054  728055  0   chr11   727182  729182
chr11   728055  728056  0   chr11   727182  729182
chr11   728191  728192  100 chr11   727182  729182
chr11   728192  728193  25  chr11   727182  729182
chr11   728205  728206  25  chr11   727182  729182
chr11   728504  728505  0   chr11   727182  729182
chr11   728641  728642  50  chr11   727182  729182
chr11   728802  728803  75  chr11   727182  729182
chr11   728803  728804  100 chr11   727182  729182
chr11   728903  728904  100 chr11   727182  729182
chr11   728949  728950  100 chr11   727182  729182
chr11   729113  729114  0   chr11   727182  729182
chr11   729114  729115  100 chr11   727182  729182
chr11   729145  729146  0   chr11   727182  729182
chr11   729167  729168  66.67   chr11   727182  729182

And my output:

-990.0  1.0 1
-970.0  0   0
-950.0  0   0
-930.0  1.0 1
-910.0  1.0 1
-890.0  0   0
-870.0  0   0
-850.0  0   0
-830.0  0   0
-810.0  0   0
-790.0  0   0
-770.0  0.35    2
-750.0  0   0
-730.0  0   0
-710.0  0.75    1
-690.0  0   0
-670.0  0   0
-650.0  0.6667  1
-630.0  0.666666666667  3
-610.0  0.25    2
-590.0  0.5 2
-570.0  0.75    1
-550.0  0.5 2
-530.0  1.0 1
-510.0  0   0
-490.0  0   0
-470.0  0   0
-450.0  0   0
-430.0  0.58335 2
-410.0  0   0
-390.0  0   0
-370.0  0.8 1
-350.0  0   0
-330.0  0   0
-310.0  1.0 1
-290.0  0   0
-270.0  0.6111  3
-250.0  0   0
-230.0  0.25    2
-210.0  0.916666666667  3
-190.0  0   0
-170.0  0   0
-150.0  0.66665 2
-130.0  0.0 2
-110.0  0   0
-90.0   0   0
-70.0   0   0
-50.0   0   0
-30.0   0   0
-10.0   0   0
10.0    0.625   2
-30.0   0   0
-10.0   0   0
10.0    0.625   2
30.0    0.25    1
50.0    0   0
70.0    0   0
90.0    0   0
110.0   0   0
130.0   0   0
150.0   0   0
170.0   0   0
190.0   0   0
210.0   0   0
230.0   0   0
250.0   0   0
270.0   0   0
290.0   0   0
310.0   0   0
330.0   0.0 1
350.0   0   0
370.0   0   0
390.0   0   0
410.0   0   0
430.0   0   0
450.0   0.5 1
470.0   0   0
490.0   0   0
510.0   0   0
530.0   0   0
550.0   0   0
570.0   0   0
590.0   0   0
610.0   0   0
630.0   0.875   2
650.0   0   0
670.0   0   0
690.0   0   0
710.0   0   0
730.0   1.0 1
750.0   0   0
770.0   1.0 1
790.0   0   0
810.0   0   0
830.0   0   0
850.0   0   0
870.0   0   0
890.0   0   0
910.0   0   0
930.0   0.5 2
950.0   0   0
970.0   0.0 1
990.0   0.6667  1
ADD REPLYlink written 16 days ago by Ashley40

Why is this result different from the first one that you showed? This data (above) may look more like Fig3A.

ADD REPLYlink written 16 days ago by Kevin Blighe35k

Input data for CTCF binding sites:

chr1    1306613 1308613
chr1    2104904 2106904
chr1    2312363 2314363
chr1    2360679 2362679
chr1    2737736 2739736
chr1    2837778 2839778
chr1    3235698 3237698
chr1    3340442 3342442
chr1    3406962 3408962
chr1    3640156 3642156
chr1    4711814 4713814
chr1    5805041 5807041
chr1    5975233 5977233
chr1    6100695 6102695
chr1    6112165 6114165
chr1    6245040 6247040
chr1    6305956 6307956
chr1    6402673 6404673
chr1    6407177 6409177
chr1    6473553 6475553
chr1    6497261 6499261
chr1    6534742 6536742
chr1    6582865 6584865
chr1    6705758 6707758
chr1    6779234 6781234
chr1    6853402 6855402
chr1    6949129 6951129
chr1    7101963 7103963
chr1    7706475 7708475
chr1    7728710 7730710
chr1    7812064 7814064
chr1    7962003 7964003
chr1    8090447 8092447
chr1    8373541 8375541
chr1    8408231 8410231
chr1    8785762 8787762
chr1    8908143 8910143
chr1    9047285 9049285
chr1    9169996 9171996
chr1    9348249 9350249
chr1    9843786 9845786
chr1    9926716 9928716
chr1    9934106 9936106
chr1    10553885        10555885
chr1    10554288        10556288
chr1    10566674        10568674
chr1    10969158        10971158
chr1    11001198        11003198
chr1    11300940        11302940
chr1    11701665        11703665
chr1    11779027        11781027
chr1    11953180        11955180:
ADD REPLYlink written 16 days ago by Ashley40

My boss gave me a month to repeat this picture. Any help about this figure would be great appreciate! Thank you so much !

ADD REPLYlink modified 16 days ago • written 16 days ago by Ashley40
1

If my boss gave me a warning like that, I would walk straight to the Human Resources department and / or the Head of Department (or her/his secretary) to report the incident, and provide any recorded evidence. Threats like that constitute bullying / abuse and I am aware that these things occur all across the World. However, I do not know how these processes function in your own university / institution.

ADD REPLYlink written 16 days ago by Kevin Blighe35k

I think my boss is motiving me to do things by putting pressure on me.

The reason why they are different is that for above result, I only selected some part of all data.

For HCG methylation, I just selected top 47 rows in chr11 for testing. And for CTCF binding result, I only selected 1 row in chr11 for testing.

ADD REPLYlink modified 16 days ago • written 16 days ago by Ashley40

I would not call it motivation... That is an archaic way of supervising. Ruling by fear is a cheap and cowardly way of trying to gain control over a person or people.

I neither understand why you need a complex Python script. Assuming that you have some normalised data in BAM format, all that you need to do is extract the signal intensity per base position up- and down-stream of your target CTCF region. The plot itself is then easy, as it is just a line graph.

ADD REPLYlink written 16 days ago by Kevin Blighe35k

I also think this is a very simple work. But when I drew this plot, the result is different from published data. Anyway, thanks for your email and great advice. I found aaRon R package can drew the same type of this figure. I think I would change the GCH, WCG, TSS, HMEC, and '3000' to GCH, HCG, CTCF summit, IMR90, and '1000'. Do you think this would be better? Thanks a lot. And Looking forward to your reply.

methylationBiPlot(GCH, WCG, TSS, samples["HMEC", ], up=3000, down=3000, addN=FALSE)
ADD REPLYlink written 15 days ago by Ashley40

Thanks for your kind help!!

Thanks you so much!

ADD REPLYlink modified 17 days ago • written 17 days ago by Ashley40

I want to close this thread. It's Okay now. human genome version hg18 and hg19

ADD REPLYlink written 14 days ago by Ashley40

The thread is 'closed' in the sense that an answer has been provided. Are you implying that you want to delete the entire thread?

ADD REPLYlink written 14 days ago by Kevin Blighe35k

Thanks for your quick reply. I don't want to delete the entire thread. I just want to close this thread because the answer is provided. Thanks a lot.

ADD REPLYlink written 14 days ago by Ashley40

In 1 day, the thread will disappear from the main page and become 'buried' among the 1000s of old threads. By having an answer provided, it will neither be 'bumped' back to the main page by the system.

ADD REPLYlink written 14 days ago by Kevin Blighe35k
1

Thank you very much for your kind email and reply.

ADD REPLYlink written 14 days ago by Ashley40
1
gravatar for Ashley
15 days ago by
Ashley40
China
Ashley40 wrote:

The reason why I didn't this figure is the human genome version. NOMe-seq hg19 CTCF ChIP-seq hg18 When I use liftover to convert the version, the figure is same.

Thanks for your kind help!

Best wishes, Ashley

ADD COMMENTlink written 15 days ago by Ashley40

I have moved this to an answer and accepted it for you, just to close off this thread. Good luck.

ADD REPLYlink written 15 days ago by Kevin Blighe35k
1

Thank you very much! It’s okay. So kind of you. Close this thread. Best regards, Ashley

ADD REPLYlink modified 15 days ago • written 15 days ago by Ashley40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1128 users visited in the last hour