6 months ago by
Seattle, WA USA
If your peak files are in BED format, you can use BEDOPS to do set operations to count overlaps between peaks:
$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
$ bedops --everything A.bed B.bed | bedops --not-element-of 100% - <(bedops --everything elements_unique_to_A.bed elements_unique_to_B.bed) > elements_unique_to_A_and_B.bed
In a Venn diagam, you can think of these files as representing the disjoint overlaps between sets:
To count these subsets, use
$ wc -l elements_unique_to_A.bed
Once you have overlap counts, you would put these counts into a Venn diagram or Eulergrid/UpSetR plot to visualize them.
If you have more than two sets, you would calculate overlaps between all subsets (the "powerset") of combinations of sets, and then count them with
Also, if you have more than two sets, you would not want to use a Venn diagram, but instead consider using an Eulergrid-style (UpSetR) plot. This is because use of more than two sets with a Venn diagram can lead to false interpretation of overlaps.
Eulergrid/UpSetR plots deal with this problem by showing overlaps as visually-distinct and proportionally-correct elements, and offering ways to sort or organize those elements that highlights certain subset overlaps.