Multiple ways to make an IntersectBed on bed file after peak calling : which one is the good one ??
Entering edit mode
4.1 years ago
anais1396 ▴ 30

Hello Biostars !!

I have .bed files resulting from Macs peak calling that are biological replicates. I have 5 replicates for healthy people and 6 for patients. I merged the 5 samples to a set for healthy represented in one file and I made the same for the patients so I have one bed file for all the merged peaks from patients and one bed file for healthy.

Now I'm looking for common genomic regions between these two files. I performed an IntersectBed with a UCSC table for the known genes on the genome but I see multiple ways to do the intersectBed :

  1. intersect condition1 with ucsc table, then intersect condition2 with ucsc and then make a 'super' intersect between the two files obtained ?
  2. As intersect now allows the use of multiple files, make an intersect between condition1+condition2+ucsc table ?
  3. Use Venn diagram but here we have the same problem : make a venn diagram 3-points with cond1+cond2+ucsc or make a venn diagram with cond1+ucsc then with cond2+ucsc and compare them ?

Which strategy is the good one ??

Thank you in advance !!


NGS Peak calling intersect • 1.7k views
Entering edit mode
4.1 years ago
ATpoint 62k

IMHO you should go for a differential read count analysis in the peak regions, rahter than doing intersections. Reason is that peak presence or absence is a rather binary statement, but not really informative. Example: Say, very simplyfied, a peak requires 100 reads to be significant, and your healthies have 999-556-777-599-1000 read counts for a given peak, respektively. The patients have 100-123-222-321-132. By intersection, you would see no difference at all, suggesting no difference between the conditions. Still, the counts are much higher in the patients, suggesting an effect for this genomic regions in the disease situation that may be of interest. This you'll only pick up by scanning for differential counts. Popular packages for this, assuming you have ChIP-seq or ATAC-seq data are DiffBind (which uses the DESeq2 framework internally), or csaw. Both have extensive documentation, so that you can dive into it.


Login before adding your answer.

Traffic: 1811 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6