I have a txt file:
chr start end superfluous_data
chr1 3000000 3039999 0.00585524735801591
chr1 3040000 3079999 0.00462068257738901
chr1 3080000 3119999 0.00410291608104423
chr1 3120000 3159999 0.00445902789765337
I manipulated the file as I am only interested in the intervals:
awk '{print $1,'\t',$2,'\t',$3}' data/file.txt > intervals_of_interest.bed
I wanted to get the occupancy values (specified by a different bed file) of a particular protein at these intervals.
Heterochromatin.bed (genome-wide):
chr1 3049360 3053345 Region_1 0 0
chr1 3136664 3138809 Region_2 0 0
chr1 3786627 3791240 Region_4 0 0
chr1 4164204 4167731 Region_5 0 0
chr1 4599546 4604437 Region_7 0 0
chr1 5355834 5360997 Region_10 0 0
My attempt to align and assign the occupancy values to the region of interest is as follows:
bedmap --echo --echo-map-id-uniq intervals_of_interest.bed ../Heterochromatin.bed
oddly the output looks like the below
chr 1 0|
tart 1 0|
nd 1 0|
hr1 3000000 3039999|
chr1 3040000 3079999|
chr1 3080000 3119999|
chr1 3120000 3159999|
chr1 3160000 3199999|
chr1 3200000 3239999|
(but more worrying is the fact that I can't seem to assign calculated occupancy values to these intervals):
Can anyone tell me if it is possible to re-calculate occupancy values from one bed file and map to different intervals?
Thanks
Thank you for your insightful answer- Can I ask: after having used:
I get the following:
Why do I get more than one value for some?
You have one or more overlapping elements from
Heterochromatin.bed
, which overlap each interval of interest.Where there are two or more overlaps, the score values of all overlapping elements are separated by a semi-colon character.
If you want to debug things and see exactly which elements are overlapping each interval-of-interest, use
--echo-map
in place of--echo-map-score
.