Hello Everyone, Can you please share with me your insights on how to split a bedGraph file into genomic coordinates of equal bin size? I have average log2(fold enrichment) values calculated for a chIP over input as follows: [columns: 1) chr.name; 2) start; 3)end; 4) log2 value]:
chr1 0 450 0
chr1 450 500 1.4033
chr1 500 650 1.79393
chr1 650 700 0.865939
chr1 700 950 0
chr1 950 1000 0.865939
Now, I want to expand this file in such a way that the values are reported for defined 50bp windows, instead of windows of non-uniform length. As you can see, wherever the log value is same, the windows are combined to make one big window (for example, I want to change the 0-450 into 9x(50)).
I want to do this so that, I can then use two such log2ratio files (corresponding to two chIPs) to make a correlation plot. I am new to NGS data analysis so any and all help is appreciated!
Guidance on how to do this using a python script is highly appreciated.