Question: Enrichment determination from random sampling
gravatar for Jonathan Crowther
4.2 years ago by
Jonathan Crowther180 wrote:

Hi all,

I would like to know if my approach makes sense or not.

I have a count of histone mark peaks (Encode BigWig files) in an 80Kb window from hg19.

I have sampled 1000 positions (80Kb in length) of hg19 and generated a null empirical distribution based on the counts of peaks from each of the 1000 positions. All looks good and we have a normal looking distribution.

Now to check if my original 80Kb window is significantly enriched I can simply check to see if my observation counts falls inside or outside of the 95% of the null distribution count data by looking at the mean and standard deviation of the null distribution (95%-99% rule). I would also like to get an approximate P-Value so this is where I am slightly unsure. I think I can do two of the following depending on a one tail or two tail test:

One Tail Test: Get the total number of observations from the null distribution that are greater than or equal to my observation and divide by 1000.

Two Tail Test: Subtract my observation count from the mean of null distribution count and then take the total number of observations greater than or equal to this absolute value difference and divide this by 1000.


Conformation or Suggestions will be greatly appreciated!

ADD COMMENTlink modified 4.2 years ago by Devon Ryan89k • written 4.2 years ago by Jonathan Crowther180
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

Yes, that's how it should be done. In both cases you're testing for the fraction more extreme than what you observed. In one case, you only care about "more extreme in one direction" (a 1-tailed test) while in the other you care about "more extreme in either direction" (a 2-tailed test).

ADD COMMENTlink written 4.2 years ago by Devon Ryan89k

Great, thanks for the confirmation!

ADD REPLYlink written 4.2 years ago by Jonathan Crowther180

I have a similar issue and I would like to go with the Two tail test, but shouldn't the second point also divide by the standard deviation before testing?

t = [ mean.null - my.observation.count ] / [ sd.null / sqrt( 1000 ) ]

ADD REPLYlink written 3.5 years ago by User 7754230

No, we're not computing something for a T-test. We're directly computing the p-value from an empirical background distribution. When computing things like a T-statistic, one needs to incorporate the standard deviation to compute what, in this case, has already been empirically observed.

ADD REPLYlink written 3.5 years ago by Devon Ryan89k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 921 users visited in the last hour