Get nonoverlapping regions of two bed files
2
0
Entering edit mode
3.9 years ago
vctrm67 ▴ 50

I have two bed files. I want to find the regions of non-overlap between the two; for example, I have bed A and bed B, and I want to find the subregions in A that don't overlap with B.

A
1   1   10
1   15   20

B 
1   5    10
1   17   20

I want to get

1   1    5
1   15    17

I was considering bedtools intersect -v, but I think that would just give me full regions of A that do not intersect at all with B, whereas I am also interested in the subregions that don't intersect with B, for regions that intersect partially with B.

bed • 2.3k views
ADD COMMENT
0
Entering edit mode
3.9 years ago
ATpoint 81k
bedtools multiinter -i *.bed | awk 'OFS="\t" {if($4 == 1) print $1, $2, $3}'
1   1   5
1   15  17

multiinter tells you the coverage of an interval that is covered at least once. Column4 of the output then is the coverage. Since you want intervals being only in one file you filter for $4 == 1.

bedtools multiinter -i *.bed
1   1   5   1   1   1   0
1   5   10  2   1,2 1   1
1   15  17  1   1   1   0
1   17  20  2   1,2 1   1
ADD COMMENT
0
Entering edit mode

Wouldn't this give me intervals of B that are only covered once? For example, if I had:

A
1   1   10
1   17   20

B 
1   5    10
1   15   20

Wouldn't this still give the answer I posted originally? I wouldn't want this because in this case the 15 - 17 interval comes from B, whereas I just want the intervals covered only by A.

ADD REPLY
0
Entering edit mode
$ bedops --partition A B | bedops -n 1 - B
1   1   5
ADD REPLY
0
Entering edit mode

It matches exactly the example result you posted. Anyway, you can pass the -names option to display from which file the intervals come and then apply further filtering:

bedtools multiinter -i *.bed -names *.bed
1   1   5   1   A.bed   1   0
1   5   10  2   A.bed,B.bed 1   1
1   15  17  1   A.bed   1   0
1   17  20  2   A.bed,B.bed 1   1
ADD REPLY
0
Entering edit mode
3.9 years ago

You can do this with BEDOPS and a partition operation.

  1. Sort the files with sort-bed:

    $ sort-bed A.bed > A.sorted.bed
    $ sort-bed B.bed > B.sorted.bed
    

    Your examples are sorted, but other approaches to sort BED files that I see posted on this site do not sort on the stop position, where there are ties on the start position, which can require use of sort-bed to get a correct answer downstream.

  2. Partition the multiset union of the files with bedops --partition:

    $ bedops --partition A.sorted.bed B.sorted.bed > AB.partition.bed
    
  3. Then do a "not-element-of" operation on the partition with bedops --not-element-of 1:

    $ bedops --not-element-of 1 AB.partition.bed B.sorted.bed > answer.bed
    

    One base of overlap should be sufficient, as the partition is a disjoint set.

If you want to avoid making intermediate files (and get a faster answer), here's a one-liner that does everything in one step:

$ bedops --partition <(sort-bed A.bed) <(sort-bed B.bed) | bedops --not-element-of 1 - <(sort-bed B.bed) > answer.bed
ADD COMMENT

Login before adding your answer.

Traffic: 1487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6