Question: Merge bed files, keep the peak count
1
gravatar for rioualen
2.6 years ago by
rioualen320
France
rioualen320 wrote:

Hello,

I'm trying to merge peak files in a way that I can't find so far.

I have 16 peak files concatenated in my_peaks.bed

I've tried to use bedtools merge following this example:

$ cat my_peaks.bed
chr1  100 300
chr1  200 500
chr1  0 700

$ bedtools merge -i my_peaks.bed -c 1 -o count
chr1  0 700 3

But actually the kind of output that I want would be the following:

$ bedtools merge -i my_peaks.bed -c 1 -o count
chr1  0  100  1
chr1  100  200  2
chr1  200  300  3
chr1  300  500  2
chr1  500  700  1

I've looked into bedops option --partition which would give :

$ bedops --partition my_peaks.bed
chr1  0  100 
chr1  100  200 
chr1  200  300 
chr1  300  500 
chr1  500  700

But then I don't know how to keep the peak count.

Thanks for help

chip-seq bedops bedtools • 1.4k views
ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 2.6 years ago by rioualen320
3
gravatar for Alex Reynolds
2.6 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

You're very close; just pipe the partioned elements to BEDOPS bedmap to count peaks over partitions:

$ sort-bed my_peaks.unsorted.bed > my_peaks.bed
$ bedops --partition my_peaks.bed | bedmap --echo --count --delim '\t' - my_peaks.bed
chr1    0   100 1
chr1    100 200 2
chr1    200 300 3
chr1    300 500 2
chr1    500 700 1

Note that you first need to sort your peaks. Using (BEDOPS) sort-bed will be faster than Unix sort and will sort on the stop column, where there are matches on the start column, which sortBed does not do. You only need to sort once, at the start.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Alex Reynolds27k
2

Thank you! I was close indeed, I was trying out bedmap but had a count of 1 everywhere...

For the sorting, I use bedtools sort which is probably similar to Bedops sort-bed I guess. I'm just discovering Bedops I think I'm gonna like it!

Have a nice day

ADD REPLYlink written 2.6 years ago by rioualen320

EDIT: Found out macs2 and SPP peak-callers actually produce duplicate peaks. Gonna merge them.

Actually there's an issue though... I have 16 peak files, containing non-overlapping features, so the maximum count should be 16. But in a few cases, it goes up to 40 or 50.

Here are the commands I've used:

bedfiles=$(ls GSM*.bed)

cat $bedfiles > temp
cut -f1-3 temp > all_peaks.bed

sort-bed all_peaks.bed > all_peaks_sorted.bed
bedops --partition all_peaks_sorted.bed | bedmap --echo --count --delim '\t' - all_peaks_sorted.bed

I don't understand where it comes from!

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by rioualen320

A set of peaks within one file may not overlap (or they might, until you merge peaks — your sample input above contains three overlapping regions), but once you concatenate 16 sets of peaks, peaks from different sets/files could certainly overlap a partition and you would get additional counts in that case. You may want to think about how you are merging peaks within and across sets, and what you really want to measure.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1238 users visited in the last hour