3 months ago by
Seattle, WA USA
You could union the files, specify the overlap criterion for mapping, and then post-process the result to count true peaks. The ID field (fifth column) can be useful as an identifier. Using intervals alone is probably not enough, because of cases where peaks overlap.
For example, to union sorted BED files:
$ bedops -u exp1peaks1.bed exp1peaks2.bed ... exp1peaks10.bed > exp1peaks.bed
Then map the unioned set of peaks against each other:
$ bedmap --count --echo --echo-map-id-uniq --fraction-map 0.51 exp1peaks.bed | awk '($1>=6)' | cut -f2- > candidates.bed
--count operator returns the number of mapped peaks that overlap the reference peak.
--echo operator returns the reference peak.
--echo-map-id-uniq operator returns the unique IDs of mapped peaks that overlap the reference peak.
--fraction-map 0.51 operator is the overlap criterion, which requires that more than half of a mapped peak must overlap the reference peak to be called an overlap.
cut statements at the end return candidate peaks, where there are six or more mapped peaks that met the
bedmap overlap criterion. The output is another sorted BED file.
To deal with this:
I saw someone say that if you have one big peak that overlaps several shorter ones, the count will be all off.
Once you have candidate peaks, you can use a Python script to look at the contents of the mapped peak IDs returned by
Using peak IDs that follow a parseable pattern will help you count true overlaps among common peaks in
candidates.bed, filtering out those where you run into the above scenario.
The procedure could be redone on files for the second experiment. Then you can compare intervals across two experiments and do a final count.