Question: find overlapping Chip-seq peaks and return outerboundries
gravatar for m.wekking
2 days ago by
m.wekking0 wrote:

I have .bed files with different peaks and I need to find the overlap, but when two peaks overlap I want the program to return the outer boundries of the two peaks, i.e.

$ cat sample1.bed
chr1   20    40
chr1   60    80
chr2   80    120

$ cat sample2.bed
chr1   17    35
chr1   75    90
chr2   70    100

I want as output

chr1   17    40
chr1   60    90
chr2   80    120

Is this possible using bedTools or any other tool (preferably not in R)?

chip-seq overlap • 65 views
ADD COMMENTlink modified 2 days ago by Alex Reynolds26k • written 2 days ago by m.wekking0
gravatar for ATpoint
2 days ago by
ATpoint10k wrote:

A bit cumbersome, but it works:

bedtools intersect -a sample1.bed -b sample2.bed -wa -wb | \
  awk 'OFS="\t" {if ($2 < $5) print $1, $2, $3, $6; else print $1, $5, $3, $6}' | \
  awk 'OFS="\t" {if ($3 > $4) print $1, $2, $3; else print $1, $2, $4}'

First, do the intersect outputting the entire input intervals of -a and -b in case of overlap, and then use awk to select the smaller of the two start coordinates and the larger of the two end coordinates. The awk part can for sure be written more elegantly.

ADD COMMENTlink written 2 days ago by ATpoint10k

Thank you very much! will this also work if I have more than 2 .bed files?

ADD REPLYlink written 2 days ago by m.wekking0

No, because as can be seen from the awk command, this is tailored for a pairwise comparison of two files. If you have multiple files, one would probably need to do something with bedtools multiinter in a two-step process, so first using multiinter to identify regions that overlap in at least a certain number or all samples and then use the common interval to do another round of intersect against all files to get the flanks. Can you provide some more details on how many files you have, and what the criteria are?

ADD REPLYlink written 2 days ago by ATpoint10k
gravatar for Alex Reynolds
2 days ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Not sure about chr2 in your example, but you could do the following for 1 to N sorted BED files:

$ bedops --merge A.bed B.bed ... N.bed > merge.bed
$ bedops --everything A.bed B.bed ... N.bed > union.bed

Then use the merge and union sets to look for two or more element overlaps:

$ bedmap --count --echo --delim '\t' merge.bed union.bed | awk '($1 >= 2)' | cut -f2- > answer.bed

Replace 2 with N or any other threshold you'd like.

When you have some overlap criteria to meet between sets, pairing bedmap with awk can work well.

ADD COMMENTlink modified 2 days ago • written 2 days ago by Alex Reynolds26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour