Question: Consensus of multiple BED files
gravatar for bruce.moran
3.9 years ago by
bruce.moran830 wrote:

I am looking for a tool that can take multiple BED inputs and output as per my example below. Not sure how to describe this, I suppose it makes a consensus of all regions in the BEDs, then outputs values in column 4 based on overlap. Is there a quick BEDtools or something? It seems relatively trivial and I am starting to write a Perl script but would appreciate anyones thoughts.

My example inputs:

chr1 100 1000 0.5

chr1 100 200 0.1
chr1 300 400 0.2

And the 'consensus' output:

chr1 100 200  0.1;0.5
chr1 200 300  NA;0.5
chr1 300 400  0.2;0.5
chr1 400 1000 NA;0.5
bed bedtools • 1.6k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 3.9 years ago by bruce.moran830

might be BEDOPS could help or galaxy has many thing around bed files

ADD REPLYlink written 3.9 years ago by A3.8k

Use BEDOPS bedops --partition to generate your disjoint intervals, bedops --everything to generate a unioned set, and bedmap --echo-map-id to map ids of a unioned set to the disjointed intervals. The documentation is helpful in showing how this works, but you might do:

$ bedops --everything A.bed B.bed > union.bed
$ bedops --partition union.bed > partition.bed
$ bedmap --echo --echo-map-id --delim '\t' partition.bed union.bed > answer.bed

Once you see how each step works, it is easy to integrate with a pipeline or script with any desired tweaks, like windows or padding, or mapping of score or other columns, etc.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Alex Reynolds30k

Thanks,I checked out BEDOPS after Angels suggestion and did it as you described, very neat, many thanks.

ADD REPLYlink written 3.9 years ago by bruce.moran830

Thank you Alex, this is helpful, however I have the problem where for some regions, there are no values. I am merging 30 or so files together. for example:

chr9 121507304 140976434 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2

chr10 0 103741188

chr10 103741188 104966662

chr10 104966662 115535087


chr22 0 21727195

chr22 21727195 26344341

chr22 26344341 51005330

chrX 0 154915660 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2

What should I do? I followed your instructions and it seems to work for small numbers of files.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by ashmc0

Nevermind. I ended up using BEDOPS and got it working fine.

ADD REPLYlink modified 11 months ago • written 2.8 years ago by ashmc0
gravatar for eromasko
3.9 years ago by
United States
eromasko120 wrote:

Hi bruce.moran. I think the BEDtools intersect combined with the groupby command might be helpful here.


Collapsing: listing all of the values in the opCol for a given group.¶

Now for something different. What if we wanted all of the names of the repeats listed on the same line as the variants? Use the collapse option. This “denormalizes” things. Now you have a list of all the repeats on a single line.

$ bedtools groupby -i variantsToRepeats.bed -grp 1-4 -c 9 -o collapse chr21 9719758 9729320 variant1 ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha, chr21 9729310 9757478 variant2 L1PA3,L1P1,ALR/Alpha,ALR/Alpha, chr21 9795588 9796685 variant3 (GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,

ADD COMMENTlink written 3.9 years ago by eromasko120

Another nice method, I use BEDtools alot and thought it was odd I couldn't get what I wanted, ended up using the above BEDOPS as it is slightly simpler, thanks for your comment though.

ADD REPLYlink written 3.9 years ago by bruce.moran830
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1351 users visited in the last hour