Question

MACS/MACS2 Peak calling failure with Paired End ChIP-seq data of 125bp read length

2

Entering edit mode

8.0 years ago

vikas ▴ 20

I have a TF ChIP-seq time course study with read length of 125bp Paired end around 30 to 50M paired reads in different libraries. Analysis pipeline in short: Map to mm10 with Bowtie2, remove duplicates using samtools, peak calling by MACS2 using BAM files as input (with format as BAMPE). The number of peaks I get is in two digits. But when I shut down the dynamic lambda (--nolambda option) the peak numbers reaches 4-5 digits. I look at those peaks in genome browser and I see really beautiful peaks in ChIP sample and no peak in input control. I fail to understand why MACS2 calls those peaks only when the --nolambda option is used. I have used various other parameters of MACS as well but I fail to get the peak numbers high until unless I use --nolambda option.

I've tried other options of MACS as well but nothing is working out for me. Could this be the possibility that larger read length and paired end data is not something that MACS could handle?

Any advice/suggestion is most welcome and appreciated

ChIP-Seq • 6.0k views

ADD COMMENT • link 8.0 years ago by vikas ▴ 20

0

Entering edit mode

This is a lame answer, but this information should really be included in all posts on Biostars reporting an apparent bug in software. Are you using the latest version of MACS2? If not, do you still experience the same problem after updating with the latest version of MACS2?

Also, you should post the exact command used in case there was an error in your command.

ADD REPLY • link 8.0 years ago by ablanchetcohen ★ 1.2k

0

Entering edit mode

--nolambda If True, MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure.

If I am not wrong, when you use --nolambda , MACS uses the whole genome coverage as the local lambda. Usually for a larger genome size the whole genome coverage value is lesser compared to coverage from local regions (1kb, 5kb surrounding the peak regions). And so, the peaks (ChiP) are able to survive the score thresholds while using this option.

Also, in the MACS google groups and from the main paper, '--nolambda' usage was suggested for a treatment sample without control or for histone marks

If no control data is available for such ChIP-Seq data, the local background estimation should be skipped via setting --nolambda in the command line.

Another way to check the problem would be to generate the BEDGRAPH files using '--nolambda' and without it. And compare the 4th column of the lambda.bdg from the two runs. You could follow this tutorial https://github.com/taoliu/MACS/wiki/Build-Signal-Track

Hope that helps!

Aarthi

ADD REPLY • link 8.0 years ago by RT ▴ 10