Question: How to find overlapping regions among three bed files [Solved]
0
gravatar for Bioradical
3.1 years ago by
Bioradical50
United States
Bioradical50 wrote:

So I have a pretty simple question that I actually can't seem to be able to figure out. So far I have been using bedtools to find overlaps between two bed files using intersectBed. Example:

Bed A

Bed B

Bed C

Now I have three generated bed files that I want to overlap to find peaks/regions common among all three.

ABC

The multiintersect option doesn't seem to have any documentation besides the information found here Bedtools Compare Multiple Bed Files? but the function itself doesn't seem to give me the information i'm looking for. Specifically I want to feed in three bed files and find only the common regions between ALL three, not AB, AC, BC, and ABC in one large file which seems to be the output shown in the linked example.

I believe that intersectBed -a A -b B C does something similar to the above, but perhaps i'm simply running it wrong and my attempts are errors on my part.

Can this be done using bedtools? If not, what other similar software is out there that can accomplish this?

I appreciate any help,

 

Carlos

overlaps bedtools • 2.2k views
ADD COMMENTlink modified 3.1 years ago by Alex Reynolds26k • written 3.1 years ago by Bioradical50
2

first find in-between any two files and use the results to compare with third file.

ADD REPLYlink written 3.1 years ago by geek_y8.8k

Excellent. This achieved exactly what I wanted. Thank you!

ADD REPLYlink written 3.1 years ago by Bioradical50
3
gravatar for geek_y
3.1 years ago by
geek_y8.8k
geek_y8.8k wrote:

Answer:

First find in-between any two files and use the results to compare with third file.

intersectBed -a 1.bed -b 2.bed | intersectBed -a - -b 3.bed
ADD COMMENTlink written 3.1 years ago by geek_y8.8k
1
gravatar for Alex Reynolds
3.1 years ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Here's a more general approach with BEDOPS bedmap --count, which generalizes to N input files:

$ N=`ls *.bed | wc -l`
$ bedops --everything A.bed B.bed C.bed ... N.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -vN=${N} '$1==N' \
    | cut -f2- \
    > common.bed

By changing the test in the awk statement, this approach can be modified to return other subsets of the input's power set, e.g., all elements common to N-1 inputs, N-2 inputs, etc.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds26k

Dear Alex,

Does --count --echo identifies the overlaps within the merged list? Can it do that? Also could you please specify what does "\" and "-" does? Also, I tried the N= `ls *bed | wc -l` but it gives me N command not found error in command line.

Thank you for your help.

Tunc

ADD REPLYlink written 3.0 years ago by morovatunc380

My advice is to break things down so you see how it works.

After:

$ N=`ls *.bed | wc -l`

Then run:

$ echo "${N}"

Likewise, run tee in between the two steps here, and after the bedmap statement:

$ bedops --everything A.bed B.bed C.bed ... N.bed | bedmap --count --echo --delim '\t' - | ... > common.bed

So:

$ bedops --everything A.bed B.bed C.bed ... N.bed | tee betweenSteps1and2.txt | bedmap --count --echo --delim '\t' - | ... > common.bed

And:

$ bedops --everything A.bed B.bed C.bed ... N.bed | bedmap --count --echo --delim '\t' - | tee betweenSteps2and3.txt |... > common.bed

The \ character lets you break a pipeline down on multiple lines, and the - character specifies standard input, in place of a regular file. Using standard input and output streams is an important advantage to using BEDOPS and Unix tools, so it is worth a few minutes to read about.

ADD REPLYlink written 3.0 years ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour