Question: ATAC-seq peak calling with MACS
10
gravatar for igor
14 months ago by
igor4.7k
United States
igor4.7k wrote:

I am trying to call peaks in ATAC-seq data. Not surprisingly, MACS is a popular option. According to the MACS documentation:

  1. To find enriched cutting sites such as some DNAse-Seq datasets. In this case, all 5' ends of sequenced reads should be extended in both direction to smooth the pileup signals. If the wanted smoothing window is 200bps, then use '--nomodel --shift -100 --extsize 200'.

  2. For certain nucleosome-seq data, we need to pileup the centers of nucleosomes using a half-nucleosome size for wavelet analysis (e.g. NPS algorithm). Since the DNA wrapped on nucleosome is about 147bps, this option can be used: '--nomodel --shift 37 --extsize 73'.

Based on a brief literature review, people use both --shift -100 --extsize 200 and --shift 37 --extsize 73 for ATAC-seq. Is one option more appropriate? Are there maybe different sub-types of ATAC-seq where one is better than the other?

atac-seq chip-seq peaks macs • 8.0k views
ADD COMMENTlink modified 10 days ago by Simply Bioinformatics60 • written 14 months ago by igor4.7k

I was wondering which parameters to use too. Assuming this is the pipeline ENCODE is using (https://github.com/kundajelab/atac_dnase_pipelines), specifically the ATAC-seq pipeline protocol definition (https://docs.google.com/document/d/1f0Cm4vRyDQDu0bMehHD7P7KOMxTOP-HiNoIvL1VcBt8/edit?usp=sharing)

I am seeing that in the step 3a. Peak calling - MACS2 : function macs2()

shiftsize := round( smooth_window.parseReal()/2.0 )

macs2 callpeak \ -t $tag -f BED -n "$prefix" -g "$gensz" -p $pval_thresh \ --nomodel --shift -$shiftsize --extsize $smooth_window --broad --keep-dup all

The shift is - shiftsize

If you look down in the document:

function main() {

// By default, smoothing window for MACS2 is 150. IDR threshold is 0.1
// for ENCODE3, smoothing window for MACS2 is 73. IDR threshold is 0.05
if ( ENCODE3 ) {
      smooth_win = 73
      idr_thresh = 0.05
}
else {
      smooth_win = 150
      idr_thresh = 0.1
}
  

So effectively it would be either

macs2 callpeak ... --nomodel --shift -37 --extsize 73 --broad --keep-dup all

or

macs2 callpeak ... --nomodel --shift -75 --extsize 150 --broad --keep-dup all

Could it be that the first one will be sensitive to peaks and valleys in signal due to nucleosomes, while the second one will extend the cutting sites to cover 1 nucleosome?

ADD REPLYlink written 8 months ago by jaime.alvarez.benayas0
6
gravatar for benformatics
14 months ago by
benformatics150
benformatics150 wrote:

Comparison of cuts by technique

Assuming that in ATAC-Seq, we make a cut at "any" open chromatin regions, I imagine that using nucleosome specific parameters would not be optimal. Given that for things like MNAse you assume your average fragment is as long as the DNA wrapped around the nucleosome (~150 nt) which is not necessarily true for ATAC. Thus my guess would be method #1 would provide more optimal results.

Although if you check this workflow neither of those parameters are used for MACS: http://informatics.fas.harvard.edu/atac-workshop-capellini-lab.html

ADD COMMENTlink modified 14 months ago • written 14 months ago by benformatics150

In that workflow, they also don't adjust for larger fragment size during Bowtie2 alignment (-X/--maxins), which is specifically discussed in the original ATAC-seq paper, so I wouldn't treat it as the gold standard.

Also, that's a really nice illustration. Where is it from?

ADD REPLYlink written 14 months ago by igor4.7k

The source of the image is here: http://www.the-scientist.com/?articles.view/articleNo/44772/title/Reveling-in-the-Revealed/

In regards to the larger fragment size, it does seem smart to try to adjust the --X/--maxins parameter to match your dataset. In my case it increased the number mapped reads by 1-2%.

I found this resource to be very helpful. It almost seems to suggest that increasing -X anytime you use bowtie2 is vital. http://lab.loman.net/2013/05/02/use-x-with-bowtie2-to-set-minimum-and-maximum-insert-sizes-for-nextera-libraries/

ADD REPLYlink modified 14 months ago • written 14 months ago by benformatics150

It shouldn't change the alignment rate by much, since it will just attempt to align the mates independently by default. However, it will count them as discordant. Also, the fragment size distribution will be cut off.

And I agree. Increasing that parameter is helpful in general. You don't always know what your library size is going to be. The only downside is it will take longer to run.

ADD REPLYlink written 14 months ago by igor4.7k

I think an -X 2000 for bowtie2 makes sense, as most current sequencing platforms are not able to sequence fragments > 2000bp. So this is rather a filter against false-positive alignments and pretty much standard in most pipelines.

ADD REPLYlink written 14 months ago by ATPoint2.4k
3
gravatar for igor
14 months ago by
igor4.7k
United States
igor4.7k wrote:

I got another alternative from the MACS developer:

If you followed original protocol for ATAC-Seq, you should get Paired-End reads. If so, I would suggest you just use "--format BAMPE" to let MACS2 pileup the whole fragments in general. But if you want to focus on looking for where the 'cutting sites' are, then '--nomodel --shift -100 --extsize 200' should work.

Source: https://github.com/taoliu/MACS/issues/145

ADD COMMENTlink written 14 months ago by igor4.7k

Why specifying --shift in BAMPE? The manual says that in BAMPE, --shift can only be zero. Also, if you provide paired-end data, why using an --extsize. The fragment size is already determined by the paired-end sequcening.

I used to follow the strategy from the NucleoATAC and single-cell ATAC paper, [macs2 -g hs --nomodel]

ADD REPLYlink written 14 months ago by ATPoint2.4k

I think he meant specifying either BAMPE or shift, not both. If you don't specify BAMPE, reads are not treated as pairs.

ADD REPLYlink modified 14 months ago • written 14 months ago by igor4.7k

That's a good suggestion concerning altering the macs2 call to use the "real" insert size for PE data. We will try it and see how it compares to past peak calling.

ADD REPLYlink modified 14 months ago • written 14 months ago by benformatics150

Although then a double nucleosome produces a peak twice the size of a single nucleosome, which I am not sure I am entirely comfortable with either.

ADD REPLYlink written 14 months ago by igor4.7k
1

The regions where ATAC detects nucleosomes are open chromatin regions, where I think you always have a mixture of signals, both NFR, mono, and maybe di-or trinucleosomes. Inferring these features requires more specialized software, such as NucleoATAC. The peakcalling, in my understanding, just gives you a region where transposase cutting events are enriched and in which "you can dig deeper". The NucleoATAC software then takes a bed with peaks as input and the BAM sequencing files to perform modelling of the fragment size distribution and to infer NFR and nucleosome positions. So I think MACS peaks are (by themselves) not suitable for base-pair-resolved analysis.

I think it depends what you want to use your data for. If you just want to say which regions are accessable or not, then MACS peaks should do the job. If in-depth analysis, such as NFR and nucleosome positions or TF footprints are to be obtained, you need specialized software/ custom approaches, such as NucleoATAC, but also excellent data quality and good sequencing depth.

ADD REPLYlink modified 14 months ago • written 14 months ago by ATPoint2.4k
0
gravatar for Simply Bioinformatics
10 days ago by
WashingtonDC
Simply Bioinformatics60 wrote:

We Are Using This:

macs2 callpeak -t input -f BED -g "hs" -p 0.01 --nomodel --shift -37 --extsize 73 -B --SPMR --keep-dup all --call-summits

Currently, there is an existing issue on MACS2 which intermittent -1 as pValue which will crash the IDR as downstream analysis

as a heads up: convert all -1 to zero using sed

ADD COMMENTlink written 10 days ago by Simply Bioinformatics60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1416 users visited in the last hour