Question: Multiple ways to make an IntersectBed on bed file after peak calling : which one is the good one ??
gravatar for anais1396
2.4 years ago by
anais139630 wrote:

Hello Biostars !!

I have .bed files resulting from Macs peak calling that are biological replicates. I have 5 replicates for healthy people and 6 for patients. I merged the 5 samples to a set for healthy represented in one file and I made the same for the patients so I have one bed file for all the merged peaks from patients and one bed file for healthy.

Now I'm looking for common genomic regions between these two files. I performed an IntersectBed with a UCSC table for the known genes on the genome but I see multiple ways to do the intersectBed :

  1. intersect condition1 with ucsc table, then intersect condition2 with ucsc and then make a 'super' intersect between the two files obtained ?
  2. As intersect now allows the use of multiple files, make an intersect between condition1+condition2+ucsc table ?
  3. Use Venn diagram but here we have the same problem : make a venn diagram 3-points with cond1+cond2+ucsc or make a venn diagram with cond1+ucsc then with cond2+ucsc and compare them ?

Which strategy is the good one ??

Thank you in advance !!


intersect ngs peak calling • 1.2k views
ADD COMMENTlink modified 2.4 years ago by ATpoint40k • written 2.4 years ago by anais139630
gravatar for ATpoint
2.4 years ago by
ATpoint40k wrote:

IMHO you should go for a differential read count analysis in the peak regions, rahter than doing intersections. Reason is that peak presence or absence is a rather binary statement, but not really informative. Example: Say, very simplyfied, a peak requires 100 reads to be significant, and your healthies have 999-556-777-599-1000 read counts for a given peak, respektively. The patients have 100-123-222-321-132. By intersection, you would see no difference at all, suggesting no difference between the conditions. Still, the counts are much higher in the patients, suggesting an effect for this genomic regions in the disease situation that may be of interest. This you'll only pick up by scanning for differential counts. Popular packages for this, assuming you have ChIP-seq or ATAC-seq data are DiffBind (which uses the DESeq2 framework internally), or csaw. Both have extensive documentation, so that you can dive into it.

ADD COMMENTlink written 2.4 years ago by ATpoint40k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1403 users visited in the last hour