There is no clear consensus on this problem. I currently like to use Genrich to call peaks over replicated samples.
A command line could be:
Genrich -t sample1.bam,sample2.bam,sample3.bam -j -l 200 -q 0.05 -o out.narrowPeak. You will have to sort your BAMs by read name first.
This will call peaks over every single sample and then combine the p-values of each sample using Fisher's method.
This saves you from annoying filtering like
peak must be present in all samples or
in 2 of 3 samples, or from using the IDR framework which only works for n=2. You can also use
macs to call peaks over the combined BAM files but this in my experience gives a lot of peaks from which you do not know if they are reproducible across replicates. In any case, I merge the peak lists of the replicate groups to get a template to make a count matrix from (with
featureCounts). I do it that way because I personally prefer to have a conservative list of confidence peaks that I use throughout the project. Others might have different opinions. A popular different opinion is given in the manual of the
csaw package where the author recommends to avoid peak calling at all in favor of using sliding windows for differential analysis. Still, as I said, I prefer a list of confidence peaks as one might need this peak list for other applications in the project beyond diff. analysis. Choice is yours.
Once you have the count matrix you can use packages like
edgeRfor differential analysis. If you use
edgeR I suggest to check the
csaw manual which contains example code for ChIP/ATAC-like analysis. Normalization can be done either using the peak counts or using the counts of 10kb windows across the genome (again, check
csaw manual for details). The latter I find useful if the samples are expected to be very different from each other.
modified 6 weeks ago
6 weeks ago by
ATpoint ♦ 25k