Question: merging bed coordinates - get depth and information about merged regions
0
gravatar for Richard
3.7 years ago by
Richard550
Canada
Richard550 wrote:

Hi all,

I expect that this is doable with bedtools or similar, but I haven't figured it out yet.

If I start with a file like this:

chr1 2582250 2583750 6.5558
chr1 2582625 2584125 9.03696
chr1 2583000 2584500 13.3717
chr1 2583375 2584875 19.4317

how can I generate this result, which averages the scores across regions that are redundantly covered?

chr1 2582250 2582625 $avg(6.5558)
chr1 2582625 2583000 $avg(6.5558, 9.03696)
chr1 2583000 2583375 $avg(6.5558,  9.03696, 13.3717)
chr1 2583375 2583750 $avg(6.5558,  9.03696, 13.3717, 19.4317)
chr1 2583750 2584125 $avg(9.03696, 13.3717, 19.4317)
chr1 2584125 2584500 $avg(13.3717, 19.4317)
chr1 2584500 2584875 $avg(19.4317)

 

merging bed bedtools • 1.2k views
ADD COMMENTlink modified 3.7 years ago by Alex Reynolds28k • written 3.7 years ago by Richard550
2
gravatar for Alex Reynolds
3.7 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Given a sorted input BED5 file called signal.bed, you can use BEDOPS bedops --partition to make a set of disjoint elements, and then use bedmap --echo --mean to map the master signal file against the disjoint elements and calculate the mean signal:

$ bedops --partition signal.bed | bedmap --echo --mean - signal.bed > answer.bed

BEDOPS tools use standard input and output streams, so this will run very fast.

If you need to change a bedgraph file into a sorted BED5 file, you can use awk and sort-bed:

$ awk '{print $1"\t"$2"\t"$3"\tid-"NR"\t"$4}' signal.bedgraph | sort-bed - > signal.bed
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Alex Reynolds28k

Thanks Alex,

I tried your suggestion on the example input above, but got the following output....

chr1    2582250    2582625|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2582625    2583000|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2583000    2583375|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2583375    2583750|8706578497747938881449877737314362882630863587747513275139113652758508894266166904062754451450323262076938276160844316701266975455819838009019673671515494761275802674881615046553632768.000000
chr1    2583750    2584125|NAN
chr1    2584125    2584500|NAN
chr1    2584500    2584875|NAN


any ideas?


 

ADD REPLYlink written 3.7 years ago by Richard550

Ah. Looks like my file needed 5 columns. I added a dummy column in col 4 and the means are calculated on the fifth column.

ADD REPLYlink written 3.7 years ago by Richard550

Sorry, I meant BED5, not BED4. I've amended my answer. Need coffee.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1381 users visited in the last hour