Question: ATAC-seq peak calling with MACS
gravatar for igor
4.1 years ago by
United States
igor11k wrote:

I am trying to call peaks in ATAC-seq data. Not surprisingly, MACS is a popular option. According to the MACS documentation:

  1. To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.

  2. For certain nucleosome-seq data, we need to pileup the centers of nucleosomes using a half-nucleosome size for wavelet analysis (e.g. NPS algorithm). Since the DNA wrapped on nucleosome is about 147bps, this option can be used: '--nomodel --shift 37 --extsize 73'.

Based on a brief literature review, people use both --shift -100 --extsize 200 and --shift 37 --extsize 73 for ATAC-seq. Is one option more appropriate? Are there maybe different sub-types of ATAC-seq where one is better than the other?

atac-seq chip-seq peaks macs • 22k views
ADD COMMENTlink modified 2.5 years ago by ZheFrench300 • written 4.1 years ago by igor11k

I was wondering which parameters to use too. Assuming this is the pipeline ENCODE is using (, specifically the ATAC-seq pipeline protocol definition (

I am seeing that in the step 3a. Peak calling - MACS2 : function macs2()

shiftsize := round( smooth_window.parseReal()/2.0 )

macs2 callpeak \ -t $tag -f BED -n "$prefix" -g "$gensz" -p $pval_thresh \ --nomodel --shift -$shiftsize --extsize $smooth_window --broad --keep-dup all

The shift is - shiftsize

If you look down in the document:

function main() {

// By default, smoothing window for MACS2 is 150. IDR threshold is 0.1
// for ENCODE3, smoothing window for MACS2 is 73. IDR threshold is 0.05
if ( ENCODE3 ) {
      smooth_win = 73
      idr_thresh = 0.05
else {
      smooth_win = 150
      idr_thresh = 0.1

So effectively it would be either

macs2 callpeak ... --nomodel --shift -37 --extsize 73 --broad --keep-dup all


macs2 callpeak ... --nomodel --shift -75 --extsize 150 --broad --keep-dup all

Could it be that the first one will be sensitive to peaks and valleys in signal due to nucleosomes, while the second one will extend the cutting sites to cover 1 nucleosome?

ADD REPLYlink written 3.7 years ago by jaime.alvarez.benayas20

So finally what is the consensus to use for MACS2 with ATAC-SEQ paired-end ?

I read that -f BAMPE was a good pratice . But then should it be used with --nomodel option ?

For what it worth, this is what I'm using until now on my bam without duplicates:

macs2 callpeak --verbose 3 --treatment my.bam --name myname -g hs --bdg -q 0.05 --keep-dup all -f BAMPE --broad
ADD REPLYlink written 2.5 years ago by ZheFrench300

Hi, ZheFrench, the read position should be shifted +4 and -5 bp in the BAM file before macs2 callpeak? Thank you!

ADD REPLYlink written 2.3 years ago by afli190

Is --nomodel not required if you are using -f BAMPE?

ADD REPLYlink written 2.2 years ago by Lucy40
gravatar for igor
4.1 years ago by
United States
igor11k wrote:

I got another alternative from the MACS developer:

If you followed original protocol for ATAC-Seq, you should get Paired-End reads. If so, I would suggest you just use "--format BAMPE" to let MACS2 pileup the whole fragments in general. But if you want to focus on looking for where the 'cutting sites' are, then '--nomodel --shift -100 --extsize 200' should work.

And then more recently:

Why do so many papers use --shift -100 --extsize 200 for MACS2 rather than -f BAMPE if this is not recommended for paired-end data?

--shift -100 --extsize 200 will amplify the 'cutting sites' enrichment from ATAC-seq data. So in the end, the 'peak' is where Tn5 transposase likes to attack. The fact is that, although many information such as the insertion length and the other mate alignment is ignored, such result is still usable. Especially when the short fragment population is extremely dominant, the final output won't be off much.


ADD COMMENTlink modified 2.1 years ago • written 4.1 years ago by igor11k

That's a good suggestion concerning altering the macs2 call to use the "real" insert size for PE data. We will try it and see how it compares to past peak calling.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by benformatics2.0k

Although then a double nucleosome produces a peak twice the size of a single nucleosome, which I am not sure I am entirely comfortable with either.

ADD REPLYlink written 4.1 years ago by igor11k

The regions where ATAC detects nucleosomes are open chromatin regions, where I think you always have a mixture of signals, both NFR, mono, and maybe di-or trinucleosomes. Inferring these features requires more specialized software, such as NucleoATAC. The peakcalling, in my understanding, just gives you a region where transposase cutting events are enriched and in which "you can dig deeper". The NucleoATAC software then takes a bed with peaks as input and the BAM sequencing files to perform modelling of the fragment size distribution and to infer NFR and nucleosome positions. So I think MACS peaks are (by themselves) not suitable for base-pair-resolved analysis.

I think it depends what you want to use your data for. If you just want to say which regions are accessable or not, then MACS peaks should do the job. If in-depth analysis, such as NFR and nucleosome positions or TF footprints are to be obtained, you need specialized software/ custom approaches, such as NucleoATAC, but also excellent data quality and good sequencing depth.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by ATpoint40k
gravatar for benformatics
4.1 years ago by
ETH Zurich
benformatics2.0k wrote:

Comparison of cuts by technique

Assuming that in ATAC-Seq, we make a cut at "any" open chromatin regions, I imagine that using nucleosome specific parameters would not be optimal. Given that for things like MNAse you assume your average fragment is as long as the DNA wrapped around the nucleosome (~150 nt) which is not necessarily true for ATAC. Thus my guess would be method #1 would provide more optimal results.

Although if you check this workflow neither of those parameters are used for MACS [fixed link thanks to Dataman]:

Image linked fixed (source):

ADD COMMENTlink modified 7 days ago • written 4.1 years ago by benformatics2.0k

The link to the workflow/workshop is broken. The working link is:

ADD REPLYlink written 12 months ago by Dataman330

In that workflow, they also don't adjust for larger fragment size during Bowtie2 alignment (-X/--maxins), which is specifically discussed in the original ATAC-seq paper, so I wouldn't treat it as the gold standard.

Also, that's a really nice illustration. Where is it from?

ADD REPLYlink written 4.1 years ago by igor11k

The source of the image is here:

In regards to the larger fragment size, it does seem smart to try to adjust the --X/--maxins parameter to match your dataset. In my case it increased the number mapped reads by 1-2%.

I found this resource to be very helpful. It almost seems to suggest that increasing -X anytime you use bowtie2 is vital.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by benformatics2.0k

It shouldn't change the alignment rate by much, since it will just attempt to align the mates independently by default. However, it will count them as discordant. Also, the fragment size distribution will be cut off.

And I agree. Increasing that parameter is helpful in general. You don't always know what your library size is going to be. The only downside is it will take longer to run.

ADD REPLYlink written 4.1 years ago by igor11k
gravatar for Simply Bioinformatics
3.0 years ago by
Simply Bioinformatics170 wrote:

We Are Using This:

macs2 callpeak -t input -f BED -g "hs" -p 0.01 --nomodel --shift -37 --extsize 73 -B --SPMR --keep-dup all --call-summits

Currently, there is an existing issue on MACS2 which intermittent -1 as pValue which will crash the IDR as downstream analysis

as a heads up: convert all -1 to zero using sed

ADD COMMENTlink written 3.0 years ago by Simply Bioinformatics170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1320 users visited in the last hour