When I have several experimental groups I prefer to first call peaks with
Genrich for every group, then merge the peaks with
bedtools merge to get a consensus list from which we create a count matrix with all samples. I like
featureCounts for this. The author of the
csaw package Aarol Lun who suggests to us small bins for differential analysis would probably argue that this is not the best approach (see csaw vignette and paper for details why), but for me from a user perspective it is important to have actual peak lists for each group that are constant during a project, so that is why I use this approach.
From there on I prefer the standard
edgeR workflow, so normalization factor calculation,
filterByExpr and then model fitting and testing. See the vignettes of
edgeR and especially
csaw for code examples.
csaw also contains some helpful details on certain parameters one should use for ATAC/ChIP-seq compared to RNA-seq for which the tool has originally been developed. Finally, correct for multiple testing and set a FDR cutoff, e.g. 0.05 or 0.01. Again, see the manuals for details.
I agree with Ian Sudbery's comment on filtering low-quality samples. Be sure to always inspect samples on a genome browser. ATAC-seq should produce clear peaks with little noise. Peak numbers should be in the ten-thousands range and FRiPs, depending on how you calculate it should be > 10%. In some cases lower FRiPs might still be acceptable. Also be sure to make PCA plots to check for outlier samples and check normalization efficiency with MA-plots, again
csaw manual has details and code for that. I would not say that 35M reads is a must, that is already fairly deeply-sequenced. I actually aim for 25M per sample but have done diff. analysis with samples that had far fewer reads. Higher counts are statistically more powerful, still replication is more important than depth.
modified 10 months ago
10 months ago by
ATpoint ♦ 44k