Question

Get nonoverlapping regions of two bed files

0

Entering edit mode

3.9 years ago

vctrm67 ▴ 50

I have two bed files. I want to find the regions of non-overlap between the two; for example, I have bed A and bed B, and I want to find the subregions in A that don't overlap with B.

A
1   1   10
1   15   20

B 
1   5    10
1   17   20

I want to get

1   1    5
1   15    17

I was considering bedtools intersect -v, but I think that would just give me full regions of A that do not intersect at all with B, whereas I am also interested in the subregions that don't intersect with B, for regions that intersect partially with B.

bed • 2.3k views

ADD COMMENT • link 3.9 years ago by vctrm67 ▴ 50

score 0 · Answer 1 · 2020-05-29

0

Entering edit mode

3.9 years ago

ATpoint 81k

bedtools multiinter -i *.bed | awk 'OFS="\t" {if($4 == 1) print $1, $2, $3}'
1   1   5
1   15  17

multiinter tells you the coverage of an interval that is covered at least once. Column4 of the output then is the coverage. Since you want intervals being only in one file you filter for $4 == 1.

bedtools multiinter -i *.bed
1   1   5   1   1   1   0
1   5   10  2   1,2 1   1
1   15  17  1   1   1   0
1   17  20  2   1,2 1   1

ADD COMMENT • link 3.9 years ago by ATpoint 81k

0

Entering edit mode

Wouldn't this give me intervals of B that are only covered once? For example, if I had:

A
1   1   10
1   17   20

B 
1   5    10
1   15   20

Wouldn't this still give the answer I posted originally? I wouldn't want this because in this case the 15 - 17 interval comes from B, whereas I just want the intervals covered only by A.

ADD REPLY • link 3.9 years ago by vctrm67 ▴ 50

0

Entering edit mode

$ bedops --partition A B | bedops -n 1 - B
1   1   5

ADD REPLY • link 3.9 years ago by Alex Reynolds 35k

0

Entering edit mode

It matches exactly the example result you posted. Anyway, you can pass the -names option to display from which file the intervals come and then apply further filtering:

bedtools multiinter -i *.bed -names *.bed
1   1   5   1   A.bed   1   0
1   5   10  2   A.bed,B.bed 1   1
1   15  17  1   A.bed   1   0
1   17  20  2   A.bed,B.bed 1   1

ADD REPLY • link 3.9 years ago by ATpoint 81k

score 0 · Answer 2 · 2020-05-29

You can do this with BEDOPS and a partition operation.

Sort the files with sort-bed:
```
$ sort-bed A.bed > A.sorted.bed
$ sort-bed B.bed > B.sorted.bed
```
Your examples are sorted, but other approaches to sort BED files that I see posted on this site do not sort on the stop position, where there are ties on the start position, which can require use of sort-bed to get a correct answer downstream.

Partition the multiset union of the files with bedops --partition:

$ bedops --partition A.sorted.bed B.sorted.bed > AB.partition.bed

Then do a "not-element-of" operation on the partition with bedops --not-element-of 1:
```
$ bedops --not-element-of 1 AB.partition.bed B.sorted.bed > answer.bed
```
One base of overlap should be sufficient, as the partition is a disjoint set.

If you want to avoid making intermediate files (and get a faster answer), here's a one-liner that does everything in one step:

$ bedops --partition <(sort-bed A.bed) <(sort-bed B.bed) | bedops --not-element-of 1 - <(sort-bed B.bed) > answer.bed