I have a txt file:
chr start end superfluous_data
chr1 3000000 3039999 0.00585524735801591
chr1 3040000 3079999 0.00462068257738901
chr1 3080000 3119999 0.00410291608104423
chr1 3120000 3159999 0.00445902789765337
I manipulated the file as I am only interested in the intervals:
awk '{print $1,'\t',$2,'\t',$3}' data/file.txt > intervals_of_interest.bed
I wanted to get the occupancy values (specified by a different bed file) of a particular protein at these intervals.
Heterochromatin.bed (genome-wide):
chr1 3049360 3053345 Region_1 0 0
chr1 3136664 3138809 Region_2 0 0
chr1 3786627 3791240 Region_4 0 0
chr1 4164204 4167731 Region_5 0 0
chr1 4599546 4604437 Region_7 0 0
chr1 5355834 5360997 Region_10 0 0
My attempt to align and assign the occupancy values to the region of interest is as follows:
bedmap --echo --echo-map-id-uniq intervals_of_interest.bed ../Heterochromatin.bed
oddly the output looks like the below
chr 1 0|
tart 1 0|
nd 1 0|
hr1 3000000 3039999|
chr1 3040000 3079999|
chr1 3080000 3119999|
chr1 3120000 3159999|
chr1 3160000 3199999|
chr1 3200000 3239999|
(but more worrying is the fact that I can't seem to assign calculated occupancy values to these intervals):
Can anyone tell me if it is possible to re-calculate occupancy values from one bed file and map to different intervals?
Thanks
Thank you for your insightful answer- Can I ask: after having used:
I get the following:
Why do I get more than one value for some?
You have one or more overlapping elements from
Heterochromatin.bed, which overlap each interval of interest.Where there are two or more overlaps, the score values of all overlapping elements are separated by a semi-colon character.
If you want to debug things and see exactly which elements are overlapping each interval-of-interest, use
--echo-mapin place of--echo-map-score.