Question: Finding common peaks between F-seq peak region files
gravatar for a.rex
6 months ago by
a.rex190 wrote:

I have 3 biological ATAC-seq conditions, each with two replicates.

I have obtained F-seq region peak files for each of the six sample.

These are the peak number metrics:

sample 1 condition1  = 260388
sample 2 condition1  = 259940
sample 1 condition2 = 292697
sample 2 condition2 = 290048
sample 1 condition3 = 284690
sample 2 condition3 = 303684

Is there a way in which I can easily compare the similarities and differences in these called peak regions? As in produce a Venn diagram for common regions and uncommon ones? Obviously there may also be two regions but that overlap - what do you do in this case?

atac fseq • 196 views
ADD COMMENTlink modified 6 months ago by Alex Reynolds28k • written 6 months ago by a.rex190
gravatar for Alex Reynolds
6 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

If your peak files are in BED format, you can use BEDOPS to do set operations to count overlaps between peaks:

$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
$ bedops --everything A.bed B.bed | bedops --not-element-of 100% - <(bedops --everything elements_unique_to_A.bed elements_unique_to_B.bed) > elements_unique_to_A_and_B.bed

In a Venn diagam, you can think of these files as representing the disjoint overlaps between sets:

enter image description here

To count these subsets, use wc -l:

$ wc -l elements_unique_to_A.bed

Once you have overlap counts, you would put these counts into a Venn diagram or Eulergrid/UpSetR plot to visualize them.

If you have more than two sets, you would calculate overlaps between all subsets (the "powerset") of combinations of sets, and then count them with wc -l.

Also, if you have more than two sets, you would not want to use a Venn diagram, but instead consider using an Eulergrid-style (UpSetR) plot. This is because use of more than two sets with a Venn diagram can lead to false interpretation of overlaps.

Eulergrid/UpSetR plots deal with this problem by showing overlaps as visually-distinct and proportionally-correct elements, and offering ways to sort or organize those elements that highlights certain subset overlaps.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Alex Reynolds28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 664 users visited in the last hour