merging bed coordinates - get depth and information about merged regions
1
0
Entering edit mode
8.7 years ago
Richard ▴ 590

Hi all,

I expect that this is doable with bedtools or similar, but I haven't figured it out yet.

If I start with a file like this:

chr1 2582250 2583750 6.5558
chr1 2582625 2584125 9.03696
chr1 2583000 2584500 13.3717
chr1 2583375 2584875 19.4317

how can I generate this result, which averages the scores across regions that are redundantly covered?

chr1 2582250 2582625 $avg(6.5558)
chr1 2582625 2583000 $avg(6.5558, 9.03696)
chr1 2583000 2583375 $avg(6.5558,  9.03696, 13.3717)
chr1 2583375 2583750 $avg(6.5558,  9.03696, 13.3717, 19.4317)
chr1 2583750 2584125 $avg(9.03696, 13.3717, 19.4317)
chr1 2584125 2584500 $avg(13.3717, 19.4317)
chr1 2584500 2584875 $avg(19.4317)
bed bedtools • 2.4k views
ADD COMMENT
2
Entering edit mode
8.7 years ago

Given a sorted input BED5 file called signal.bed, you can use BEDOPS bedops --partition to make a set of disjoint elements, and then use bedmap --echo --mean to map the master signal file against the disjoint elements and calculate the mean signal:

$ bedops --partition signal.bed | bedmap --echo --mean - signal.bed > answer.bed

BEDOPS tools use standard input and output streams, so this will run very fast.

If you need to change a bedgraph file into a sorted BED5 file, you can use awk and sort-bed:

$ awk '{print $1"\t"$2"\t"$3"\tid-"NR"\t"$4}' signal.bedgraph | sort-bed - > signal.bed
ADD COMMENT
0
Entering edit mode

Thanks Alex,

I tried your suggestion on the example input above, but got the following output:

chr1    2582250    2582625|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2582625    2583000|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2583000    2583375|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2583375    2583750|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2583750    2584125|NAN
chr1    2584125    2584500|NAN
chr1    2584500    2584875|NAN

Any ideas?

ADD REPLY
0
Entering edit mode

Ah. Looks like my file needed 5 columns. I added a dummy column in col 4 and the means are calculated on the fifth column.

ADD REPLY
0
Entering edit mode

Sorry, I meant BED5, not BED4. I've amended my answer. Need coffee.

ADD REPLY

Login before adding your answer.

Traffic: 1999 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6