Question: how to get p value for a set of fdr value
0
gravatar for Gyan Prakash Mishra
18 months ago by
INDIA
Gyan Prakash Mishra0 wrote:

Hi all,

I am comparing two bed files to find correlation between them.now to examine that whether the correlation is occurring by chance or it is meaningful correlation, in order that i have randomized bed file to 1KB upstream and downstream against which i am finding correlation. now I want to find find P value for the same.

A and B are two bed file. A corr B is 0.9 and A corr C  is 0.2 . Here how P value will be calculated.

C - randomized B to 1kb

I would really appreciate any help .

 

open • 836 views
ADD COMMENTlink modified 18 months ago by Alternative200 • written 18 months ago by Gyan Prakash Mishra0
1

The topic is misleading, a FDR (false discovery rate) is calculated for a particular p-Value, e.g., using permutation test. Probably, you meant something else?

How do you calculated the correlation between the two BED files? What are you actually comparing, i.e., what kind of entities are in the BED file? Exons? Genes? SNPs?

How exactly did you do the randomization? For each entry in the BED files you randomly selected a value from the interval (-1000; 1000) and added that to the start and end?

ADD REPLYlink written 18 months ago by Manuel Landesfeind940

Actually I am comparing two CHIPseq peak file. So here I am just comparing coordinates by checking overlap. Suppose if all coordinates of A overlaps with B that we can say correlation is 1. So like that comparison has been done. In order to assess that correlation is significant or not I have randomized coordinates by shifting it to for e.g 1kb upstream or downstream (C). now here my question was how I can say that correlation between A and B  is significant compared to A and C.  and for that I think P value is needed. I hope I made it more clear.

ADD REPLYlink written 18 months ago by Gyan Prakash Mishra0

I'm a bit confused... Do u want to get a pvalue assessing the significance of a correlation ?

ADD REPLYlink modified 18 months ago • written 18 months ago by Carlo Yague2.7k
3
gravatar for Carlo Yague
18 months ago by
Carlo Yague2.7k
Belgium
Carlo Yague2.7k wrote:

If I understand well, u want to get a pvalue assessing the significance of a correlation.

Using random permutation is a good idea. However you should do the randomization many times in order to have an empirical distribution of the (A,C) correlation. From that distribution, you can then get the significance of your (A,B) correlation.

Good luck !

Edit : The statistical test to use will depend on your distribution. If it is normal (you can use a normality test to make sure of that), then you could compute the mean, standard deviation and pvalue like this (with R) :

#rand_corr = vector of random correlations
#here for testing, 10000 random number with 0.2 mean and 0.2 sd
rand_corr=(rnorm(10000,0.2,0.2))

# pval calculation :

mean <- mean(rand_corr) #mean of random (A,C) correlations
sd <- sd(rand_corr) #sd of random correlation
x <- 0.9 #true (A,B) correlation value
z <- (x-mean)/(sd) #center normalize
2*pnorm(-abs(z))  # return pval

# visual representation :

hist(rand_corr, breaks=30)
abline(v=x, col="red")

However if the normality is not respected (it might not be since you have correlation values that cannot go below 0 or higher than 1), you might need to ressort to other tests.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Carlo Yague2.7k

Hi Carlo

Yes this is what exactly i meant. I want p value for assessing significance of correlation. 

As you said, I have randomized the bed files several times. But my question is how p value(significance) will be calculated from that empirical distribution. I am not that good at statistics. so I would be very happy if you can elaborate it.

ADD REPLYlink written 18 months ago by Gyan Prakash Mishra0

Just count how often you received a correlation coefficient larger than the one you want to test (0.9) and divide by the number of permutations (but better do something like 1000 or 10000 permutations).

ADD REPLYlink written 18 months ago by Manuel Landesfeind940

I edited my answer to elaborate as you suggested :)

ADD REPLYlink written 18 months ago by Carlo Yague2.7k

@Manuel Landesfeind I see why you would do that but I don't think you could call that a "pvalue".

ADD REPLYlink written 18 months ago by Carlo Yague2.7k
1

Hmm... the fraction `#(corr >= 0.9)/ #permutations` should converge (with a sufficient number of permutations) toward the probability for observing a correlation of 0.9 or higher from the given sample values just by chance... how does this differentiate from a p-Value?

Given an infinite number of permutations and given that the correlation coefficient truly follow a Gaussian distribution, we should get the the same resulting value, I guess. [EDIT] Probably not exactly, because from your R-Code "2*pnorm(...)" I think you get a p-Value for observing a correlation more extreme (i.e.,  x <= -0.9 or x >= 0.9 ), right? [/EDIT]

In fact, people use statistical distributions to circumvent a computationally expensive permutation test. For example, this allows a direct calculation of a p-Value from the correlation coefficient (see http://vassarstats.net/rsig.html). But, if you already calculated correlation coefficients (or any other value) for a sufficient number of permutations, you can directly get your p-Value from the calculated values. [EDIT2] However, I think that permutation tests are far more robust than p-Values estimated from a distribution. [/EDIT2]

PS: I really like statistics but I would not call myself an expert. Probably, somebody with more expertise can comment.

ADD REPLYlink modified 17 months ago • written 17 months ago by Manuel Landesfeind940

Oh, nice point !

ADD REPLYlink written 17 months ago by Carlo Yague2.7k

Thanks Carl and Manuel for discussion !!

I am going to try the method which Carl has mentioned. as i already have correlation coefficient value and several numbers of permutations because 1000 or 10000 permutations would be really computationally expensive.

I have taken 6 permutations. that would be sufficient or i shall increase it little more.

I would appreciate any further suggestion

ADD REPLYlink written 17 months ago by Gyan Prakash Mishra0
1

While I agree with Manuel's method with high number of permutation, I don't think you should use it with only 6 permutations. Imagine that your 6 permutations give correlations below 0.9, then you would have a pvalue of 0. 0, a (relatively) close approximation of your true pvalue, but you obviously can't report it like this.

A similar method that account for the uncertainty in the pvalue approximation is the Wilcoxon-Mann-Whithney test. This test doesn't assume normality.

EDIT : code in R, with 6 permutation around 0.2.

> wilcox.test(c(0.9),c(0.22,0.3,0.15,0.21,0.4,0.2),alternative="greater")

    Wilcoxon rank sum test

data:  c(0.9) and c(0.22, 0.3, 0.15, 0.21, 0.4, 0.2)
W = 6, p-value = 0.1429
alternative hypothesis: true location shift is greater than 0

EDIT2 : If you increase the number of permutations, the pvalue will decrease as your confidence increases.

ADD REPLYlink modified 17 months ago • written 17 months ago by Carlo Yague2.7k

From my point of view, six permutations are far to low for a decent estimation of a p-Value! Did you see that Carlo used 10.000 permutations in his example?

Probably, you should better use bedtools as suggested by Pierre (see below) or check papers in your research area to get a feeling on how they do it. To be honest, I do not known how good your method for creating the permutations is... but I am also not into CHIPseq analyses...

ADD REPLYlink modified 17 months ago • written 17 months ago by Manuel Landesfeind940
1
gravatar for Alternative
18 months ago by
Alternative200
Alternative200 wrote:

Bedtools can do such statistics using fisher tests. Check http://bedtools.readthedocs.org/en/latest/content/tools/fisher.html

 

ADD COMMENTlink written 18 months ago by Alternative200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1527 users visited in the last hour