I am trying to call peaks in ATAC-seq data. Not surprisingly, MACS is a popular option. According to the MACS documentation:
To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.
For certain nucleosome-seq data, we need to pileup the centers of nucleosomes using a half-nucleosome size for wavelet analysis (e.g. NPS algorithm). Since the DNA wrapped on nucleosome is about 147bps, this option can be used: '--nomodel --shift 37 --extsize 73'.
Based on a brief literature review, people use both --shift -100 --extsize 200
and --shift 37 --extsize 73
for ATAC-seq. Is one option more appropriate? Are there maybe different sub-types of ATAC-seq where one is better than the other?
I was wondering which parameters to use too. Assuming this is the pipeline ENCODE is using (https://github.com/kundajelab/atac_dnase_pipelines), specifically the ATAC-seq pipeline protocol definition (https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit?usp=sharing)
I am seeing that in the step 3a. Peak calling - MACS2 : function macs2()
The shift is - shiftsize
If you look down in the document:
So effectively it would be either
or
Could it be that the first one will be sensitive to peaks and valleys in signal due to nucleosomes, while the second one will extend the cutting sites to cover 1 nucleosome?
So finally what is the consensus to use for MACS2 with ATAC-SEQ paired-end ?
I read that -f BAMPE was a good pratice . But then should it be used with --nomodel option ?
For what it worth, this is what I'm using until now on my bam without duplicates:
Hi, ZheFrench, the read position should be shifted +4 and -5 bp in the BAM file before macs2 callpeak? Thank you!
Is --nomodel not required if you are using -f BAMPE?