find overlapping Chip-seq peaks and return outerboundries
2
0
Entering edit mode
2.4 years ago
m.wekking ▴ 10

I have .bed files with different peaks and I need to find the overlap, but when two peaks overlap I want the program to return the outer boundries of the two peaks, i.e.

$ cat sample1.bed
chr1   20    40
chr1   60    80
chr2   80    120

$ cat sample2.bed
chr1   17    35
chr1   75    90
chr2   70    100

I want as output

chr1   17    40
chr1   60    90
chr2   70    120

Is this possible using bedTools or any other tool (preferably not in R)?

ChIP-Seq Overlap • 962 views
ADD COMMENT
1
Entering edit mode
2.4 years ago
ATpoint 48k

Edit: 10.12.18 => Modified the code snippet to work with an arbitrary number of BED files:

This is the code to work with two files. It does the intersection outputting the entire input intervals of -a and -b in case of overlap, and then use awk to select the smaller of the two start coordinates and the larger of the two end coordinates. The awk part can for sure be written more elegantly.

bedtools intersect -a sample1.bed -b sample2.bed -wa -wb | \
  awk 'OFS="\t" {if ($2 < $5) print $1, $2, $3, $6; else print $1, $5, $3, $6}' | \
  awk 'OFS="\t" {if ($3 > $4) print $1, $2, $3; else print $1, $2, $4}'

To work with an arbitrary number of files, you basically do the same thing. First, you intersect the first two files with the above code and store the result to disk. Then you loop through all other BED files and apply the same code to intersect the intermediate result with the current BED file, storing the result again to disk. Repeat until all BED files have been processed. For this, all files in the current directory that are to be processed hve the .bedsuffix:

#!/bin/bash

## Code as in my first answer, just wrapped into a function:
function MyIntersect {
  bedtools intersect -a $1 -b $2 -wa -wb | \
    awk 'OFS="\t" {if ($2 < $5) print $1, $2, $3, $6; else print $1, $5, $3, $6}' | \
    awk 'OFS="\t" {if ($3 > $4) print $1, $2, $3; else print $1, $2, $4}'
}; export -f MyIntersect

## Write names of all BED files to a file:
ls *.bed > all_beds.txt

## Get total number of BED files in current directory:
TOTALNUMBER="$(cat all_beds.txt | wc -l | sed -e 's/^[ \t]*//')"

## Start with the first two files manually (head -n1 gets the first file, head -n2 | tail -n 1 the second one from all_beds.txt):
MyIntersect "$(head -n 1 all_beds.txt)" "$(head -n 2 all_beds.txt | tail -n 1)" > intermediate.txt

## and now do the same thing with this intermediate result from the command above until all other BED files have been used:
for i in `seq 3 $TOTALNUMBER`
  do
  MyIntersect intermediate.txt "$(head -n $i all_beds.txt | tail -n 1)" > intermediate_new.txt
  mv intermediate_new.txt intermediate.txt
  done

## Output is final.bed
mv intermediate.txt final.bed
ADD COMMENT
0
Entering edit mode

Thank you very much! will this also work if I have more than 2 .bed files?

ADD REPLY
0
Entering edit mode

No, because as can be seen from the awk command, this is tailored for a pairwise comparison of two files. If you have multiple files, one would probably need to do something with bedtools multiinter in a two-step process, so first using multiinter to identify regions that overlap in at least a certain number or all samples and then use the common interval to do another round of intersect against all files to get the flanks. Can you provide some more details on how many files you have, and what the criteria are?

ADD REPLY
0
Entering edit mode

For This current project I have in total 3 different files but for other projects it is possible that I have more than five repeats .. So I need a solution that works for a varying amount of files ...

ADD REPLY
0
Entering edit mode

I edited my answer to make it work for an arbitrary number of BED files in the current directory.

ADD REPLY
0
Entering edit mode
2.4 years ago

Not sure about chr2 in your example, but you could do the following for 1 to N sorted BED files:

$ bedops --merge A.bed B.bed ... N.bed > merge.bed
$ bedops --everything A.bed B.bed ... N.bed > union.bed

Then use the merge and union sets to look for two or more element overlaps:

$ bedmap --count --echo --delim '\t' merge.bed union.bed | awk '($1 >= 2)' | cut -f2- > answer.bed

Replace 2 with N or any other threshold you'd like.

When you have some overlap criteria to meet between sets, pairing bedmap with awk can work well.

ADD COMMENT

Login before adding your answer.

Traffic: 2177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6