Question

Consensus of multiple BED files

0

Entering edit mode

7.8 years ago

bruce.moran ▴ 970

I am looking for a tool that can take multiple BED inputs and output as per my example below. Not sure how to describe this, I suppose it makes a consensus of all regions in the BEDs, then outputs values in column 4 based on overlap. Is there a quick BEDtools or something? It seems relatively trivial and I am starting to write a Perl script but would appreciate anyones thoughts.

My example inputs:

BED_1
chr1 100 1000 0.5

BED_2
chr1 100 200 0.1
chr1 300 400 0.2

And the 'consensus' output:

chr1 100 200  0.1;0.5
chr1 200 300  NA;0.5
chr1 300 400  0.2;0.5
chr1 400 1000 NA;0.5

BEDTools BED • 3.7k views

ADD COMMENT • link updated 5.0 years ago by Biostar 20 • written 7.8 years ago by bruce.moran ▴ 970

1

Entering edit mode

might be BEDOPS could help or galaxy has many thing around bed files

ADD REPLY • link 7.8 years ago by zizigolu ★ 4.3k

3

Entering edit mode

Use BEDOPS bedops --partition to generate your disjoint intervals, bedops --everything to generate a unioned set, and bedmap --echo-map-id to map ids of a unioned set to the disjointed intervals. The documentation is helpful in showing how this works, but you might do:

$ bedops --everything A.bed B.bed > union.bed
$ bedops --partition union.bed > partition.bed
$ bedmap --echo --echo-map-id --delim '\t' partition.bed union.bed > answer.bed

Once you see how each step works, it is easy to integrate with a pipeline or script with any desired tweaks, like windows or padding, or mapping of score or other columns, etc.

ADD REPLY • link 7.8 years ago by Alex Reynolds 35k

0

Entering edit mode

Thanks,I checked out BEDOPS after Angels suggestion and did it as you described, very neat, many thanks.

ADD REPLY • link 7.8 years ago by bruce.moran ▴ 970

0

Entering edit mode

Thank you Alex, this is helpful, however I have the problem where for some regions, there are no values. I am merging 30 or so files together. for example:

chr9 121507304 140976434 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2

chr10 0 103741188

chr10 103741188 104966662

chr10 104966662 115535087

....

chr22 0 21727195

chr22 21727195 26344341

chr22 26344341 51005330

chrX 0 154915660 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2

What should I do? I followed your instructions and it seems to work for small numbers of files.

ADD REPLY • link 6.8 years ago by Ashley M. Conard • 0

0

Entering edit mode

Nevermind. I ended up using BEDOPS and got it working fine.

ADD REPLY • link 4.9 years ago by Ashley M. Conard • 0

score 3 · Answer 1 · 2016-09-23

Hi bruce.moran. I think the BEDtools intersect combined with the groupby command might be helpful here.

Groupby

Collapsing: listing all of the values in the opCol for a given group.¶

Now for something different. What if we wanted all of the names of the repeats listed on the same line as the variants? Use the collapse option. This “denormalizes” things. Now you have a list of all the repeats on a single line.

$ bedtools groupby -i variantsToRepeats.bed -grp 1-4 -c 9 -o collapse chr21 9719758 9729320 variant1 ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha, chr21 9729310 9757478 variant2 L1PA3,L1P1,ALR/Alpha,ALR/Alpha, chr21 9795588 9796685 variant3 (GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,