Question: Finding common peaks between F-seq peak region files
6 months ago by
a.rex190
a.rex190 wrote:

I have 3 biological ATAC-seq conditions, each with two replicates.

I have obtained F-seq region peak files for each of the six sample.

These are the peak number metrics:

``````sample 1 condition1  = 260388
sample 2 condition1  = 259940
sample 1 condition2 = 292697
sample 2 condition2 = 290048
sample 1 condition3 = 284690
sample 2 condition3 = 303684
``````

Is there a way in which I can easily compare the similarities and differences in these called peak regions? As in produce a Venn diagram for common regions and uncommon ones? Obviously there may also be two regions but that overlap - what do you do in this case?

atac fseq • 196 views
written 6 months ago by a.rex190
6 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

If your peak files are in BED format, you can use BEDOPS to do set operations to count overlaps between peaks:

``````\$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
\$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
\$ bedops --everything A.bed B.bed | bedops --not-element-of 100% - <(bedops --everything elements_unique_to_A.bed elements_unique_to_B.bed) > elements_unique_to_A_and_B.bed
``````

In a Venn diagam, you can think of these files as representing the disjoint overlaps between sets:

To count these subsets, use `wc -l`:

``````\$ wc -l elements_unique_to_A.bed
1234
``````

Once you have overlap counts, you would put these counts into a Venn diagram or Eulergrid/UpSetR plot to visualize them.

If you have more than two sets, you would calculate overlaps between all subsets (the "powerset") of combinations of sets, and then count them with `wc -l`.

Also, if you have more than two sets, you would not want to use a Venn diagram, but instead consider using an Eulergrid-style (UpSetR) plot. This is because use of more than two sets with a Venn diagram can lead to false interpretation of overlaps.

Eulergrid/UpSetR plots deal with this problem by showing overlaps as visually-distinct and proportionally-correct elements, and offering ways to sort or organize those elements that highlights certain subset overlaps.