In the output file of MACS2, the fold enrichment normally refers to enrichment of summit to local lambda model. I am using --nolambda
option for ATAC-seq analysis (recommended as there is no input file like Chip-seq). Is the output file now showing enrichment with respect to global lambda? If so, does anyone have a suggestion for a reasonable cutoff?
After trying both with
--nolambda
and default, I agree with you that there are many false positive looking peaks. I forgot where I saw the suggestion first but many papers use this option. Including the paper for HMMRATAC program written by the same group who developed MACS (they compare with MACS2 of course):"MACS2 was run with the local lambda option turned off (option –nolambda) and was run with q-value cutoffs (option -q) set to 0.5, 0.1, 0.05 (default), 0.005, 0.0005 and 0.00005. It was also run with a P-value cutoff (option -p) set to 0.6 and 0.3." https://academic.oup.com/nar/article/47/16/e91/5519166
Some other examples: https://www.nature.com/articles/s41587-019-0332-7 https://www.sciencedirect.com/science/article/pii/S2211124719311519?via%3Dihub
The first link tests different parameters for benchmarking, the other two are single-cell ATAC-seq where you intrinsically have super-low coverage. This does not apply to standard ATAC-seq. I suggest you leave it on. Alternatively, if your samples have a good sequencing depth like 20mio or so reads I found the Genrich peak caller quite useful and producing good results.
Thanks, but I didn't understand your argument about single-cell ATAC-seq. The coverage per cell or per cluster can be low but peak calling is done using all reads which treats the sample like bulk. Then the reads are counted per peak per cell. So there is no reason why
--nolambda
shouldn't produce false positive results in single-cell ATAC-seq. But perhaps this is seen as a trade-off for capturing potentially true but low peaks?I do not have hands-on experience with scATAC, but I guess if certain peaks are exclusive for a small subpopulation then coverage is still super low, probably coverage is in general lower, and probably (in the case of subpopulations) then one needs to turn the background model off to even have a chance to get peaks in these populations, but this is only guessing. As said for standard ATAC-seq, I'd leave it on.