Question: Enrichment determination from random sampling
4
4.8 years ago by
Leuven/Dublin
Jonathan Crowther190 wrote:

Hi all,

I would like to know if my approach makes sense or not.

I have a count of histone mark peaks (Encode BigWig files) in an 80Kb window from hg19.

I have sampled 1000 positions (80Kb in length) of hg19 and generated a null empirical distribution based on the counts of peaks from each of the 1000 positions. All looks good and we have a normal looking distribution.

Now to check if my original 80Kb window is significantly enriched I can simply check to see if my observation counts falls inside or outside of the 95% of the null distribution count data by looking at the mean and standard deviation of the null distribution (95%-99% rule). I would also like to get an approximate P-Value so this is where I am slightly unsure. I think I can do two of the following depending on a one tail or two tail test:

One Tail Test: Get the total number of observations from the null distribution that are greater than or equal to my observation and divide by 1000.

Two Tail Test: Subtract my observation count from the mean of null distribution count and then take the total number of observations greater than or equal to this absolute value difference and divide this by 1000.

Conformation or Suggestions will be greatly appreciated!

modified 4.8 years ago by Devon Ryan93k • written 4.8 years ago by Jonathan Crowther190
3
4.8 years ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

Yes, that's how it should be done. In both cases you're testing for the fraction more extreme than what you observed. In one case, you only care about "more extreme in one direction" (a 1-tailed test) while in the other you care about "more extreme in either direction" (a 2-tailed test).

Great, thanks for the confirmation!

I have a similar issue and I would like to go with the Two tail test, but shouldn't the second point also divide by the standard deviation before testing?

``````t = [ mean.null - my.observation.count ] / [ sd.null / sqrt( 1000 ) ]
``````
ADD REPLYlink modified 16 days ago by RamRS25k • written 4.1 years ago by User 7754230
1

No, we're not computing something for a T-test. We're directly computing the p-value from an empirical background distribution. When computing things like a T-statistic, one needs to incorporate the standard deviation to compute what, in this case, has already been empirically observed.

ADD REPLYlink modified 16 days ago by RamRS25k • written 4.1 years ago by Devon Ryan93k