When I have several experimental groups I prefer to first call peaks with Genrich
for every group, then merge the peaks with bedtools merge
to get a consensus list from which we create a count matrix with all samples. I like featureCounts
for this. The author of the csaw
package Aarol Lun who suggests to us small bins for differential analysis would probably argue that this is not the best approach (see csaw vignette and paper for details why), but for me from a user perspective it is important to have actual peak lists for each group that are constant during a project, so that is why I use this approach.
From there on I prefer the standard edgeR
workflow, so normalization factor calculation, filterByExpr
and then model fitting and testing. See the vignettes of edgeR
and especially csaw
for code examples. csaw
also contains some helpful details on certain parameters one should use for ATAC/ChIP-seq compared to RNA-seq for which the tool has originally been developed. Finally, correct for multiple testing and set a FDR cutoff, e.g. 0.05 or 0.01. Again, see the manuals for details.
I agree with Ian Sudbery's comment on filtering low-quality samples. Be sure to always inspect samples on a genome browser. ATAC-seq should produce clear peaks with little noise. Peak numbers should be in the ten-thousands range and FRiPs, depending on how you calculate it should be > 10%. In some cases lower FRiPs might still be acceptable. Also be sure to make PCA plots to check for outlier samples and check normalization efficiency with MA-plots, again csaw
manual has details and code for that. I would not say that 35M reads is a must, that is already fairly deeply-sequenced. I actually aim for 25M per sample but have done diff. analysis with samples that had far fewer reads. Higher counts are statistically more powerful, still replication is more important than depth.
•
link
modified 10 months ago
•
written
10 months ago by
ATpoint ♦ 44k
I believe you could use diffBind strategy: merge the peaks of different samples, count the number of reads in it and perform a differential analysis on those counts. I don't however have experience on ATAC-seq myself, so I cannot guarantee that's the best approach.