Merge bed files, keep the peak count
Entering edit mode
7.3 years ago
rioualen ▴ 710


I'm trying to merge peak files in a way that I can't find so far.

I have 16 peak files concatenated in my_peaks.bed

I've tried to use bedtools merge following this example:

$ cat my_peaks.bed
chr1  100 300
chr1  200 500
chr1  0 700

$ bedtools merge -i my_peaks.bed -c 1 -o count
chr1  0 700 3

But actually the kind of output that I want would be the following:

$ bedtools merge -i my_peaks.bed -c 1 -o count
chr1  0  100  1
chr1  100  200  2
chr1  200  300  3
chr1  300  500  2
chr1  500  700  1

I've looked into bedops option --partition which would give :

$ bedops --partition my_peaks.bed
chr1  0  100 
chr1  100  200 
chr1  200  300 
chr1  300  500 
chr1  500  700

But then I don't know how to keep the peak count.

Thanks for help

bedtools bedops ChIP-Seq • 3.5k views
Entering edit mode
7.3 years ago

You're very close; just pipe the partioned elements to BEDOPS bedmap to count peaks over partitions:

$ sort-bed my_peaks.unsorted.bed > my_peaks.bed
$ bedops --partition my_peaks.bed | bedmap --echo --count --delim '\t' - my_peaks.bed
chr1    0   100 1
chr1    100 200 2
chr1    200 300 3
chr1    300 500 2
chr1    500 700 1

Note that you first need to sort your peaks. Using (BEDOPS) sort-bed will be faster than Unix sort and will sort on the stop column, where there are matches on the start column, which sortBed does not do. You only need to sort once, at the start.

Entering edit mode

Thank you! I was close indeed, I was trying out bedmap but had a count of 1 everywhere...

For the sorting, I use bedtools sort which is probably similar to Bedops sort-bed I guess. I'm just discovering Bedops I think I'm gonna like it!

Have a nice day

Entering edit mode

EDIT: Found out macs2 and SPP peak-callers actually produce duplicate peaks. Gonna merge them.

Actually there's an issue though... I have 16 peak files, containing non-overlapping features, so the maximum count should be 16. But in a few cases, it goes up to 40 or 50.

Here are the commands I've used:

bedfiles=$(ls GSM*.bed)

cat $bedfiles > temp
cut -f1-3 temp > all_peaks.bed

sort-bed all_peaks.bed > all_peaks_sorted.bed
bedops --partition all_peaks_sorted.bed | bedmap --echo --count --delim '\t' - all_peaks_sorted.bed

I don't understand where it comes from!

Entering edit mode

A set of peaks within one file may not overlap (or they might, until you merge peaks — your sample input above contains three overlapping regions), but once you concatenate 16 sets of peaks, peaks from different sets/files could certainly overlap a partition and you would get additional counts in that case. You may want to think about how you are merging peaks within and across sets, and what you really want to measure.


Login before adding your answer.

Traffic: 1038 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6