Question: Why the estimated fragment length result is different between SPP and macs2 predictd?
gravatar for ben.kunfang
17 months ago by
ben.kunfang10 wrote:


The data I use is ENCFF424GON. When I use ENCODE ChIP-seq pipeline on DNAnexus and use SPP(xcorr) to calculate the estimated fragment length, it gives me 140bp, however, when I use macs2 predictd function with parameter -g hs -m 5 50, it gives me 274bp. I try several mfold combinations but no one close to 140bp. I just wondering why there is so much difference between these two algorithms. It seems both of them use cross-correlation method to decide the estimated fragment length but the results are not even closed.

Thanks in advance! Kun

ADD COMMENTlink written 17 months ago by ben.kunfang10

Difficult to answer. I would argue though that in the end it will barely make a difference which length you use for the analysis as both results reflect short and acceptable fragments for a normal ChIP(-seq) experiment. There is also a method in the csaw package (see the manual at Bioconductor) for fragment length estimation and code to plot the result that might be worth looking at. Maybe the fragmentation did not produce a clear "summit" in terms of length and you have fragments more or less evenly distributed between 150 and 300bp, so summit identification for xcorr is difficult. Again, I don't think it matters a lot. If you read the library prep protocol, you might also simply use the average length they provided there. Typically one aims for a sonication/Fragmentation length between 150-300bp.

ADD REPLYlink modified 17 months ago • written 17 months ago by ATpoint39k

Thanks for your reply! I tried csaw, and it indeed has two local peak one around 140 one around 280. Two algorithms might have different thresholds to select the local peak.

ADD REPLYlink written 17 months ago by ben.kunfang10

You can also take the mean of the two sub-peaks. As said, I really don't think it matters for both peak calling and differential analysis.

ADD REPLYlink modified 17 months ago • written 17 months ago by ATpoint39k

Good idea~but I think the estimated fragment length indeed affects the position of the peak. I used two fragment lengths separately to call the peak(macs2) and intersect the narrowPeak file. 88627/112435 are overlapped, which mean 25% of peaks are in different regions. In this case, I might not say it doesn't matter. Thanks!

ADD REPLYlink written 17 months ago by ben.kunfang10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1531 users visited in the last hour