5.0 years ago by
As I already described in your previous post with this issue (peak calling for cancer and healthy data? ), I believe the best solution is to use the input sample (GSM947411) and do the peak calling twice. That's what I would do.
EDIT: Just to reiterate this quickly, you need to distinguish between biological control and signal (normal cell line v. cancer cells) from technical control and signal (background noise levels v. supposed signal levels).
I think you are slightly confusing what peak callers like MACS do. In the essence, peak callers are just tools to distinguish between signal and noise regions. The input they take is just a set of x versus y measurements and output x coordinates that have significantly large y values. They have no biological assumptions encoded between themselves, just the assumptions about how the background noise levels are distributed. Since this noise is not uniformly distributed across the x axis (i.e. genome), MACS needs the input sample (which they call 'control') to estimate the per-base-pair noise levels. Once these levels are estimated, one can proceed to look at y coordinates that are significant.
Now your question is: "do the peaks differ between normal and cancer cell lines?". Translated to the x-y analogy above, it would be 'do the x coordinates where y is significantly large for normal cell line and the x coordinates where y is significantly large for cancer cell line differ?'. In this language it is perhaps clearer why MACS need to be run twice, and that it does not answer that question for you*. Hope this helps.
* Well, the standard options don't. As I mention in your previous post (C: peak calling for cancer and healthy data? ) there are bdgcmp and diffpeaks options in MACS that might be helpful.