Question: ATAC-seq macs2 peak calling bampe mode extension shift
1
gravatar for nanoide
12 months ago by
nanoide30
nanoide30 wrote:

Hi

So I have been calling peaks on ATAC-seq data using --nomodel and --shift -35 --extsize 75. I have been using-f BED on filtered and shifted (+4 -5 offset recommended) bed files coming from bam files (bedtools bamtobed). As my data is paired-end, I've read it would be a good practice using -f BAMPE. So I filter and shifted my bam files but I'm not sure about the parameters to use when using -f BAMPE.

According to the documentation --shift is going to be 0 because of using BAMPE. What about --nomodel and --extsize? They are not neccesarry either? I wonder if MACS2 is reading TLEN from the bam files and then an extension to be used as fragment size is no longer neccesary. And the bam should be sorted by name or macs2 can recognize mate pairs either way?

Hope somebody can comment on this because and I find myself a little bit confused interpreting this Thank you!

atac-seq chip-seq peaks macs • 1.4k views
ADD COMMENTlink modified 12 months ago by Ricardo A.90 • written 12 months ago by nanoide30
1
gravatar for igor
12 months ago by
igor7.7k
United States
igor7.7k wrote:

According to the MACS developer:

If you followed original protocol for ATAC-Seq, you should get Paired-End reads. If so, I would suggest you just use "--format BAMPE" to let MACS2 pileup the whole fragments in general. But if you want to focus on looking for where the 'cutting sites' are, then '--nomodel --shift -100 --extsize 200' should work.

Source: https://github.com/taoliu/MACS/issues/145

You should also check this previous discussion where the issue is discussed at length: ATAC-seq peak calling with MACS

ADD COMMENTlink written 12 months ago by igor7.7k

Thank you for your answer.

So I get according to the developper the options recommended are just "--f BAMPE " or "--nomodel --f BAM --shift -100 --extsize 200" However, I have filtered my rads and I have nucleosome free reads, so I know the upper limit of my fragment size (150). Wouldn't be better in this case to use --shift -75 --extsize 150?

Thanks!

ADD REPLYlink modified 12 months ago • written 12 months ago by nanoide30
1
gravatar for Ricardo A.
12 months ago by
Ricardo A.90
University of Michigan Ann Arbor, MI - USA
Ricardo A.90 wrote:

Hi, josruirod,

The --nomodel flag will tell MACS2 to bypass the model building, which would calculate the shift/extsize parameters. Since you are asking it to use the fragment information from the paired reads, then it doesn't need to figure out a model as it already have the information from the BAM file itself. Remember that MACS2 was designed to look at ChIP-seq data, where we normally use single-end reads - there is no information on the fragment sizes readily available and this would have to be inferred them from the data.

I'm 90% sure you don't need to sort by names in order for BAMPE to work - it will use the TLEN from each fragment. You can do a quick test using by trying to call peaks on the first 10,000 lines of your BAM file using samtools view -h file.bam | head -10000 | samtools view -b > file_subsampled.bam.

I haven't extensively tested the BAMPE option, but I found similar (and possibly noisier) results compared to the running MACS2 without BAMPE. Where did you get these shift -35 / extend 75 parameters from? We've been doing a lot of ATAC-seq data generation and processing over the last years, and we've been using shift -100 --extsize 200 based on the ATAC-seq fragment size distribution. You can have a look at https://github.com/ParkerLab/ATACseq-Snakemake for details on how we process our data. Also, look at https://github.com/ParkerLab/ataqv for an interactive diagnostic tool for ATAC-seq quality control.

Hope this helps.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Ricardo A.90

Thank you for your comprehensive answer

I make sure to make some tests according to your suggestions. Regarding --shift -35 --extsize 75, we used as extsize half the fragment sizes (we filtered for a maximum size of 150 bp) I have also read about macs2 predictd in Advanced peak calling. So if I got it rigth, if you have the whole fragment size distribution shift -100 --extsize 200 would be better but in case you have filtered reads per size, it would be a good idea to decrease those numbers, right?

Thanks for your time!

ADD REPLYlink modified 12 months ago • written 12 months ago by nanoide30
1

I think it makes sense to use smaller shift and extension in this case. Why are you only looking at small fragments? There is a lot of information on larger mono and dinucleosome-sized fragments.

ADD REPLYlink written 12 months ago by Ricardo A.90

It was mainly a resolution issue, according to the fragment size distribution not a good amount of fragments had higher sizes, ir order to assess nucleoomes. So we wanted to focus in nucleosome free fragments in order to predict open regions and Transcription Factors Binding Sites.

Thanks

ADD REPLYlink written 12 months ago by nanoide30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour