Question: Finding common peaks between F-seq peak region files
gravatar for a.rex
21 months ago by
a.rex220 wrote:

I have 3 biological ATAC-seq conditions, each with two replicates.

I have obtained F-seq region peak files for each of the six sample.

These are the peak number metrics:

sample 1 condition1  = 260388
sample 2 condition1  = 259940
sample 1 condition2 = 292697
sample 2 condition2 = 290048
sample 1 condition3 = 284690
sample 2 condition3 = 303684

Is there a way in which I can easily compare the similarities and differences in these called peak regions? As in produce a Venn diagram for common regions and uncommon ones? Obviously there may also be two regions but that overlap - what do you do in this case?

atac fseq • 423 views
ADD COMMENTlink modified 21 months ago by Alex Reynolds30k • written 21 months ago by a.rex220
gravatar for Alex Reynolds
21 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If your peak files are in BED format, you can use BEDOPS to do set operations to count overlaps between peaks:

$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
$ bedops --everything A.bed B.bed | bedops --not-element-of 100% - <(bedops --everything elements_unique_to_A.bed elements_unique_to_B.bed) > elements_unique_to_A_and_B.bed

In a Venn diagam, you can think of these files as representing the disjoint overlaps between sets:

enter image description here

To count these subsets, use wc -l:

$ wc -l elements_unique_to_A.bed

Once you have overlap counts, you would put these counts into a Venn diagram or Eulergrid/UpSetR plot to visualize them.

If you have more than two sets, you would calculate overlaps between all subsets (the "powerset") of combinations of sets, and then count them with wc -l.

Also, if you have more than two sets, you would not want to use a Venn diagram, but instead consider using an Eulergrid-style (UpSetR) plot. This is because use of more than two sets with a Venn diagram can lead to false interpretation of overlaps.

Eulergrid/UpSetR plots deal with this problem by showing overlaps as visually-distinct and proportionally-correct elements, and offering ways to sort or organize those elements that highlights certain subset overlaps.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Alex Reynolds30k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1390 users visited in the last hour