Question: Consensus of multiple BED files
0
gravatar for bruce.moran
2.9 years ago by
bruce.moran620
Ireland
bruce.moran620 wrote:

I am looking for a tool that can take multiple BED inputs and output as per my example below. Not sure how to describe this, I suppose it makes a consensus of all regions in the BEDs, then outputs values in column 4 based on overlap. Is there a quick BEDtools or something? It seems relatively trivial and I am starting to write a Perl script but would appreciate anyones thoughts.

My example inputs:

BED_1
chr1 100 1000 0.5

BED_2
chr1 100 200 0.1
chr1 300 400 0.2

And the 'consensus' output:

chr1 100 200  0.1;0.5
chr1 200 300  NA;0.5
chr1 300 400  0.2;0.5
chr1 400 1000 NA;0.5
bed bedtools • 1.2k views
ADD COMMENTlink modified 6 weeks ago by Biostar ♦♦ 20 • written 2.9 years ago by bruce.moran620
1

might be BEDOPS could help or galaxy has many thing around bed files

ADD REPLYlink written 2.9 years ago by F3.4k
3

Use BEDOPS bedops --partition to generate your disjoint intervals, bedops --everything to generate a unioned set, and bedmap --echo-map-id to map ids of a unioned set to the disjointed intervals. The documentation is helpful in showing how this works, but you might do:

$ bedops --everything A.bed B.bed > union.bed
$ bedops --partition union.bed > partition.bed
$ bedmap --echo --echo-map-id --delim '\t' partition.bed union.bed > answer.bed

Once you see how each step works, it is easy to integrate with a pipeline or script with any desired tweaks, like windows or padding, or mapping of score or other columns, etc.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Alex Reynolds28k

Thanks,I checked out BEDOPS after Angels suggestion and did it as you described, very neat, many thanks.

ADD REPLYlink written 2.9 years ago by bruce.moran620

Thank you Alex, this is helpful, however I have the problem where for some regions, there are no values. I am merging 30 or so files together. for example:

chr9 121507304 140976434 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2

chr10 0 103741188

chr10 103741188 104966662

chr10 104966662 115535087

....

chr22 0 21727195

chr22 21727195 26344341

chr22 26344341 51005330

chrX 0 154915660 2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2

What should I do? I followed your instructions and it seems to work for small numbers of files.

ADD REPLYlink modified 22 months ago • written 22 months ago by ashmc0

this only seems to work for chromosomes which are 1-9 and X and Y.... for chromosomes 10-22 I do not have any values.

ADD REPLYlink written 22 months ago by ashmc0
3
gravatar for eromasko
2.9 years ago by
eromasko120
United States
eromasko120 wrote:

Hi bruce.moran. I think the BEDtools intersect combined with the groupby command might be helpful here.

Groupby

Collapsing: listing all of the values in the opCol for a given group.¶

Now for something different. What if we wanted all of the names of the repeats listed on the same line as the variants? Use the collapse option. This “denormalizes” things. Now you have a list of all the repeats on a single line.

$ bedtools groupby -i variantsToRepeats.bed -grp 1-4 -c 9 -o collapse chr21 9719758 9729320 variant1 ALR/Alpha,ALR/Alpha,L1PA3,ALR/Alpha, chr21 9729310 9757478 variant2 L1PA3,L1P1,ALR/Alpha,ALR/Alpha, chr21 9795588 9796685 variant3 (GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,(GAATG)n,

ADD COMMENTlink written 2.9 years ago by eromasko120
1

Another nice method, I use BEDtools alot and thought it was odd I couldn't get what I wanted, ended up using the above BEDOPS as it is slightly simpler, thanks for your comment though.

ADD REPLYlink written 2.9 years ago by bruce.moran620
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1859 users visited in the last hour