Question

Finding common peaks between F-seq peak region files

1

Entering edit mode

5.5 years ago

a.rex ▴ 350

I have 3 biological ATAC-seq conditions, each with two replicates.

I have obtained F-seq region peak files for each of the six sample.

These are the peak number metrics:

sample 1 condition1  = 260388
sample 2 condition1  = 259940
sample 1 condition2 = 292697
sample 2 condition2 = 290048
sample 1 condition3 = 284690
sample 2 condition3 = 303684

Is there a way in which I can easily compare the similarities and differences in these called peak regions? As in produce a Venn diagram for common regions and uncommon ones? Obviously there may also be two regions but that overlap - what do you do in this case?

fseq atac • 1.2k views

ADD COMMENT • link updated 5.5 years ago by Alex Reynolds 35k • written 5.5 years ago by a.rex ▴ 350

score 2 · Answer 1 · 2018-11-01

If your peak files are in BED format, you can use BEDOPS to do set operations to count overlaps between peaks:

$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
$ bedops --everything A.bed B.bed | bedops --not-element-of 100% - <(bedops --everything elements_unique_to_A.bed elements_unique_to_B.bed) > elements_unique_to_A_and_B.bed

In a Venn diagam, you can think of these files as representing the disjoint overlaps between sets:

enter image description here

To count these subsets, use wc -l:

$ wc -l elements_unique_to_A.bed
1234

Once you have overlap counts, you would put these counts into a Venn diagram or Eulergrid/UpSetR plot to visualize them.

If you have more than two sets, you would calculate overlaps between all subsets (the "powerset") of combinations of sets, and then count them with wc -l.

Also, if you have more than two sets, you would not want to use a Venn diagram, but instead consider using an Eulergrid-style (UpSetR) plot. This is because use of more than two sets with a Venn diagram can lead to false interpretation of overlaps.

Eulergrid/UpSetR plots deal with this problem by showing overlaps as visually-distinct and proportionally-correct elements, and offering ways to sort or organize those elements that highlights certain subset overlaps.