Question: Get nonoverlapping regions of two bed files
0
gravatar for vctrm67
6 months ago by
vctrm6720
vctrm6720 wrote:

I have two bed files. I want to find the regions of non-overlap between the two; for example, I have bed A and bed B, and I want to find the subregions in A that don't overlap with B.

A
1   1   10
1   15   20

B 
1   5    10
1   17   20

I want to get

1   1    5
1   15    17

I was considering bedtools intersect -v, but I think that would just give me full regions of A that do not intersect at all with B, whereas I am also interested in the subregions that don't intersect with B, for regions that intersect partially with B.

bed • 232 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by vctrm6720
0
gravatar for ATpoint
6 months ago by
ATpoint42k
Germany
ATpoint42k wrote:
bedtools multiinter -i *.bed | awk 'OFS="\t" {if($4 == 1) print $1, $2, $3}'
1   1   5
1   15  17

multiinter tells you the coverage of an interval that is covered at least once. Column4 of the output then is the coverage. Since you want intervals being only in one file you filter for $4 == 1.

bedtools multiinter -i *.bed
1   1   5   1   1   1   0
1   5   10  2   1,2 1   1
1   15  17  1   1   1   0
1   17  20  2   1,2 1   1
ADD COMMENTlink modified 6 months ago • written 6 months ago by ATpoint42k

Wouldn't this give me intervals of B that are only covered once? For example, if I had:

A
1   1   10
1   17   20

B 
1   5    10
1   15   20

Wouldn't this still give the answer I posted originally? I wouldn't want this because in this case the 15 - 17 interval comes from B, whereas I just want the intervals covered only by A.

ADD REPLYlink written 6 months ago by vctrm6720
$ bedops --partition A B | bedops -n 1 - B
1   1   5
ADD REPLYlink modified 6 months ago • written 6 months ago by Alex Reynolds31k

It matches exactly the example result you posted. Anyway, you can pass the -names option to display from which file the intervals come and then apply further filtering:

bedtools multiinter -i *.bed -names *.bed
1   1   5   1   A.bed   1   0
1   5   10  2   A.bed,B.bed 1   1
1   15  17  1   A.bed   1   0
1   17  20  2   A.bed,B.bed 1   1
ADD REPLYlink written 6 months ago by ATpoint42k
0
gravatar for Alex Reynolds
6 months ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

You can do this with BEDOPS and a partition operation.

  1. Sort the files with sort-bed:

    $ sort-bed A.bed > A.sorted.bed
    $ sort-bed B.bed > B.sorted.bed
    

    Your examples are sorted, but other approaches to sort BED files that I see posted on this site do not sort on the stop position, where there are ties on the start position, which can require use of sort-bed to get a correct answer downstream.

  2. Partition the multiset union of the files with bedops --partition:

    $ bedops --partition A.sorted.bed B.sorted.bed > AB.partition.bed
    
  3. Then do a "not-element-of" operation on the partition with bedops --not-element-of 1:

    $ bedops --not-element-of 1 AB.partition.bed B.sorted.bed > answer.bed
    

    One base of overlap should be sufficient, as the partition is a disjoint set.

If you want to avoid making intermediate files (and get a faster answer), here's a one-liner that does everything in one step:

$ bedops --partition <(sort-bed A.bed) <(sort-bed B.bed) | bedops --not-element-of 1 - <(sort-bed B.bed) > answer.bed
ADD COMMENTlink modified 6 months ago • written 6 months ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1159 users visited in the last hour