I downloaded a number of bigWig files from the ENCODE project and converted them to bed files.
I did this as follows:
bigWigToWig file.bigWig file.wig wig2bed -x <file.wig> file.bed
However the file intervals can differ:
ENCFF001.bed chr1 2999998 2999999 id-1 1.000000 chr1 2999999 3000000 id-2 1.000000 ENCFF002.bed chr1 3001400 3001500 id-1 0.140000 chr1 3001600 3001700 id-2 0.140000
My first question is why do they start from different points in the genome? And why do genome-wide bed files always start at ~3000000- why not 1?
And I then downloaded a separate dataset from a source other than ENCODE.
HET.bed chr1 3049360 3053345 Region_1 0 0 chr1 3136664 3138809 Region_2 0 0
What I would like to do is align the bed files' intervals so that I can analyse them parallel to one another.
The interval distance between rows is arbitrary it can be 100 or 1000. All I really need is to be able to consistently manipulate the files so that the data looks something like this:
ENCF001 ENCF002 HET chr1 3000000 3000500 1 2 2 chr1 3001000 3001500 1 1 3 #column values are examples and not from real data
So can anyone help me convert a series of bed files to consistent intervals??