How is p-value calculated in MACS2
1
2
Entering edit mode
5 months ago
IIIIIIIIII ▴ 40

I have a hard time understanding how the p-value is calculated for each peak in MACS2. Can someone give a detailed example?

MACS2 • 632 views
1
Entering edit mode
5 months ago
cmdcolin ★ 3.0k

check out the section "Peak detection" here https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html

I cannot speak to broad peak detection but for the classic (e.g. MACS 1.4 codebase) basically it estimates the "background noise" using

λlocal = max(λBG, λ1k, λ5k, λ10k)

this tells you e.g. what is the statistical distribution of reads in "whole genome"(BG), 1k (reads in a 1kbp window), 5k (reads in a 5kbp window), and 10k (reads in a 10kb window). it takes the maximum of these which is basically like saying "i want to be robust to the maximum level of noise in these parameters". then, you scan using a sliding window across the whole genome, and you ask, what is the likelihood of seeing a peak with e.g. 1000 reads in your window given this poisson parameter λlocal

1
Entering edit mode

Thanks for your response. I guess my question is, once you calculated λlocal, and you know there are x number of reads at a peak region, what statistical test is used to calculate whether the number of reads is enriched compared to the expected background (what statistical test is used to calculate p-value)?

0
Entering edit mode

I don't know how you'd refer to the test (maybe a poisson test or something, i'm not super statistically literate) but that calculation is summarized very nicely by this wikipedia plot, and shows what the poisson distribution is for different parameters of λ, and different values of K (which would be the number of reads) https://en.wikipedia.org/wiki/Poisson_distribution#/media/File:Poisson_pmf.svg

so e.g. if your lambda paramter is 4, and your K is 15, then that is on the far tail of the purple line, giving you a very small P value, and so you would conclude that observing K=15 reads in this window is very unlikely to correspond to the background distribution, so you'd conclude it's a peak (the p-value is the Y-axis). in a sillier example, if you had a normal distribution of peoples heights and you were looking for giants, you'd look at each person in your set of people, compare each person to your normal distribution, and if someone was like 20 feet tall, they'd be like 5+ standard deviations away from everyone else, so you'd say "oh, theres a peak" :)

1
Entering edit mode

Thanks. I found out the test is called the exact poisson test https://stats.stackexchange.com/questions/350745/help-me-understand-poisson-test