Question: find overlapping Chip-seq peaks and return outerboundries
0
gravatar for m.wekking
10 weeks ago by
m.wekking10
m.wekking10 wrote:

I have .bed files with different peaks and I need to find the overlap, but when two peaks overlap I want the program to return the outer boundries of the two peaks, i.e.

$ cat sample1.bed
chr1   20    40
chr1   60    80
chr2   80    120

$ cat sample2.bed
chr1   17    35
chr1   75    90
chr2   70    100

I want as output

chr1   17    40
chr1   60    90
chr2   70    120

Is this possible using bedTools or any other tool (preferably not in R)?

chip-seq overlap • 205 views
ADD COMMENTlink modified 10 weeks ago by ATpoint13k • written 10 weeks ago by m.wekking10
1
gravatar for ATpoint
10 weeks ago by
ATpoint13k
Germany
ATpoint13k wrote:

Edit: 10.12.18 => Modified the code snippet to work with an arbitrary number of BED files:

This is the code to work with two files. It does the intersection outputting the entire input intervals of -a and -b in case of overlap, and then use awk to select the smaller of the two start coordinates and the larger of the two end coordinates. The awk part can for sure be written more elegantly.

bedtools intersect -a sample1.bed -b sample2.bed -wa -wb | \
  awk 'OFS="\t" {if ($2 < $5) print $1, $2, $3, $6; else print $1, $5, $3, $6}' | \
  awk 'OFS="\t" {if ($3 > $4) print $1, $2, $3; else print $1, $2, $4}'

To work with an arbitrary number of files, you basically do the same thing. First, you intersect the first two files with the above code and store the result to disk. Then you loop through all other BED files and apply the same code to intersect the intermediate result with the current BED file, storing the result again to disk. Repeat until all BED files have been processed. For this, all files in the current directory that are to be processed hve the .bedsuffix:

#!/bin/bash

## Code as in my first answer, just wrapped into a function:
function MyIntersect {
  bedtools intersect -a $1 -b $2 -wa -wb | \
    awk 'OFS="\t" {if ($2 < $5) print $1, $2, $3, $6; else print $1, $5, $3, $6}' | \
    awk 'OFS="\t" {if ($3 > $4) print $1, $2, $3; else print $1, $2, $4}'
}; export -f MyIntersect

## Write names of all BED files to a file:
ls *.bed > all_beds.txt

## Get total number of BED files in current directory:
TOTALNUMBER="$(cat all_beds.txt | wc -l | sed -e 's/^[ \t]*//')"

## Start with the first two files manually (head -n1 gets the first file, head -n2 | tail -n 1 the second one from all_beds.txt):
MyIntersect "$(head -n 1 all_beds.txt)" "$(head -n 2 all_beds.txt | tail -n 1)" > intermediate.txt

## and now do the same thing with this intermediate result from the command above until all other BED files have been used:
for i in `seq 3 $TOTALNUMBER`
  do
  MyIntersect intermediate.txt "$(head -n $i all_beds.txt | tail -n 1)" > intermediate_new.txt
  mv intermediate_new.txt intermediate.txt
  done

## Output is final.bed
mv intermediate.txt final.bed
ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by ATpoint13k

Thank you very much! will this also work if I have more than 2 .bed files?

ADD REPLYlink written 10 weeks ago by m.wekking10

No, because as can be seen from the awk command, this is tailored for a pairwise comparison of two files. If you have multiple files, one would probably need to do something with bedtools multiinter in a two-step process, so first using multiinter to identify regions that overlap in at least a certain number or all samples and then use the common interval to do another round of intersect against all files to get the flanks. Can you provide some more details on how many files you have, and what the criteria are?

ADD REPLYlink written 10 weeks ago by ATpoint13k

For This current project I have in total 3 different files but for other projects it is possible that I have more than five repeats .. So I need a solution that works for a varying amount of files ...

ADD REPLYlink written 10 weeks ago by m.wekking10

I edited my answer to make it work for an arbitrary number of BED files in the current directory.

ADD REPLYlink written 10 weeks ago by ATpoint13k
0
gravatar for Alex Reynolds
10 weeks ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Not sure about chr2 in your example, but you could do the following for 1 to N sorted BED files:

$ bedops --merge A.bed B.bed ... N.bed > merge.bed
$ bedops --everything A.bed B.bed ... N.bed > union.bed

Then use the merge and union sets to look for two or more element overlaps:

$ bedmap --count --echo --delim '\t' merge.bed union.bed | awk '($1 >= 2)' | cut -f2- > answer.bed

Replace 2 with N or any other threshold you'd like.

When you have some overlap criteria to meet between sets, pairing bedmap with awk can work well.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2105 users visited in the last hour