Entering edit mode
6.4 years ago
Ashley
▴
90
https://genome.cshlp.org/content/22/12/2497.long paper 3A:
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import os, sys, datetime
for i in range(len(sys.argv)):
"sys.argv[%d] = %s" % (i, sys.argv[i])
#print(len(sys.argv))
if len(sys.argv) != 6:
print('''python PyCalculateRatio.py ctcf NOMe-seq 'm:the x-axis>0 range' 'n:the No. of bin' counts.txt''')
exit(0)
fw= open(sys.argv[5], "w")
myDict_me = {}
fo_me = open(sys.argv[2], "r")
for line in fo_me:
line = line.rstrip()
s = line.split("\t")
key_me = s[0]+"_"+s[1]
myDict_me[key_me] = s[3]
m = int(sys.argv[3])
n = int(sys.argv[4])
delta = m / (n / 2)
result = []
time = []
pos = []
for i in range(0,n):
pos.append((-1)*m+i*delta)
for i in range(0,n):
result.append(0)
time.append(0)
fo_ctcf = open(sys.argv[1], "r")
for ctcf in fo_ctcf:
starttime = datetime.datetime.now()
ctcf_site = ctcf.strip().split()
for key in myDict_me.keys():
chrx = key.split("_")[0]
me_site = key.split("_")[1]
ratiome = myDict_me.get(key)
if((len(me_site) != "") and (len(ctcf_site)==3)):
if(chrx == ctcf_site[0]):
ratiome=float(ratiome)/100.0
x = int(me_site)
low = int(ctcf_site[1])
height = int(ctcf_site[2])
length = height - low
while(length > delta):
mid = (low+height)/2
if(x >= mid):
low = mid
else:
height = mid
mid = (low+height)/2
length = height - low
for j in range(0,n):
if((low >= (pos[j-1]+(int(ctcf_site[1])+int(ctcf_site[2]))/2)) and (low<(pos[j]+(int(ctcf_site[1])+int(ctcf_site[2]))/2))):
result[j] += float(ratiome)
time[j] += 1
else:
ctcf = fo_ctcf.readline()
endtime = datetime.datetime.now()
print((endtime - starttime).seconds)
for i in range(0,n):
# if isinstance(result,list):
if(float(time[i])==0):
fw.write(str((((-1.0)*m+delta*(i+1)+(-1.0)*m+delta*i)/2.0))+"\t"+str(result[i])+"\t"+str(time[i])+"\n")
# fw.write(str((-1.0)*m+delta*i)+"\t"+str(1-result[i])+"\t"+str(time[i])+"\n")
else:
# fw.write(str((-1.0)*m+delta*i)+"\t"+str(1-float(result[i])/float(time[i]))+"\t"+str(time[i])+"\n")
fw.write(str((((-1.0)*m+delta*(i+1)+(-1.0)*m+delta*i)/2.0))+"\t"+str(float(result[i])/float(time[i]))+"\t"+str(time[i])+"\n")
fw.close()
fo_ctcf.close()
fo_me.close()
Could you give me some advice?
I want to plot Fig. 3A in the paper "Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules'" Could anyone can help me? Any help would be great appreciate.
Thank you for joining Biostars. I have edited your post by highlighting code and pressing the
101 010
button. With the Biostars system, if you want to add subsequent information to your question, you should use theAdd comment
button (I have moved your other 2 posts to become comments).Figure 3A can easily be drawn in R. All that you require is intensity (y axis) and distance from peak (x axis). Can you paste some of your input data? Somebody else may help with a Python solution.
So kind of you! Thank you so much! Input data for HCG methylation:
Great! From where did you obtain the code? Is it from Scipy?
I highly appreciate your kind help. I wrote the code by myself.
And the following data is my output:
So, it looks like this (above) is the data required for the plot?
Is this the output of your Python script?
You can easily plot line graphs, like that in Fig 3A, using Matplotlib:
Thanks for your kind help and email. My R scripts for drawing this figure is as follows:
However, this figure of my is different from published figure. Could you give me some advice, any help would be highly appreciate.
https://www.dropbox.com/s/ffd6i4scajqp7lx/mydata.png?dl=0
Unfortunately, your figure will not look like Fig3A because all of your values are 0.7155. Your data indicates that methylation is roughly the same from point 0 for 1000 bp both up- and down-stream. Are you confident that your Python script is functioning as expected?
Thanks for your quick email. I think maybe it's ok. I didn't have the ability to figure it out. For same test dataset: My codes:
input data for testCTCF.bed :
input data for testHCG.bed:
And my output:
Why is this result different from the first one that you showed? This data (above) may look more like Fig3A.
Input data for CTCF binding sites:
My boss gave me a month to repeat this picture. Any help about this figure would be great appreciate! Thank you so much !
If my boss gave me a warning like that, I would walk straight to the Human Resources department and / or the Head of Department (or her/his secretary) to report the incident, and provide any recorded evidence. Threats like that constitute bullying / abuse and I am aware that these things occur all across the World. However, I do not know how these processes function in your own university / institution.
I think my boss is motiving me to do things by putting pressure on me.
The reason why they are different is that for above result, I only selected some part of all data.
For HCG methylation, I just selected top 47 rows in chr11 for testing. And for CTCF binding result, I only selected 1 row in chr11 for testing.
I would not call it motivation... That is an archaic way of supervising. Ruling by fear is a cheap and cowardly way of trying to gain control over a person or people.
I neither understand why you need a complex Python script. Assuming that you have some normalised data in BAM format, all that you need to do is extract the signal intensity per base position up- and down-stream of your target CTCF region. The plot itself is then easy, as it is just a line graph.
I also think this is a very simple work. But when I drew this plot, the result is different from published data. Anyway, thanks for your email and great advice. I found aaRon R package can drew the same type of this figure. I think I would change the GCH, WCG, TSS, HMEC, and '3000' to GCH, HCG, CTCF summit, IMR90, and '1000'. Do you think this would be better? Thanks a lot. And Looking forward to your reply.
Thanks for your kind help!!
Thanks you so much!
I want to close this thread. It's Okay now. human genome version hg18 and hg19
The thread is 'closed' in the sense that an answer has been provided. Are you implying that you want to delete the entire thread?
Thanks for your quick reply. I don't want to delete the entire thread. I just want to close this thread because the answer is provided. Thanks a lot.
In 1 day, the thread will disappear from the main page and become 'buried' among the 1000s of old threads. By having an answer provided, it will neither be 'bumped' back to the main page by the system.
Thank you very much for your kind email and reply.