Question: meta analysis of p values from deseq2 output
0
tonja.r470 wrote:

I have several RNA-seq studies where the same null hypothesis was tested. I analyzed each study with DESeq and as output have p-values and FDR values. I would like to do a meta analysis. By default DESeq2 produces two-sided p-values. For combine.test() function in R I need one sided p-values. So, my idea is just to do divide FDR values by 2:  and use them in combine.test(FDR/2). And to get back to two-sided test, I multiply by two the combined p values. Would it be theoretically the right approach?

rna-seq • 2.3k views
modified 4.8 years ago • written 4.8 years ago by tonja.r470

Hint: The one-sided value isn't half the two-sided value. For example, if a two-sided value is 0.01 then one of the one-sided values will be ~1 and the other significant. So you'd need to decide which side to take.

I guess I am either misunderstanding you or I am initially totally wrong. Assume, z-scores are given (so, normal distribution), to calculate two-sided p-values one would do: `two.sided.p = 2*pnorm(-abs(z))` and apply `combine.test(two.sided.p/2)`, right?

From DESeq2 paper:

For significance testing, DESeq2 uses a Wald test: the shrunken estimate of LFC is divided by its standard error, resulting in a z-statistic, which is compared to a standard normal distribution.

So, I could divide the FDR values by 2 to get one-sided p-values, couldn't I?

The one-sided p-value is half the two-sided one if the test statistics distribution is symmetric around 0: e.g. Assuming a Gaussian distribution, P(|x|>5)=P(x<-5)+P(x>5) and because of symmetry, P(x<-5)=P(x>5)=0.5*P(|x|>5). What I think Devon is referring to is that then the one-sided value of the other alternative hypothesis is 1-0.5*P(|x|>5) e.g. P(x<5)= 1-P(X>5).

What I think Devon is referring to is that then the one-sided value of the other alternative hypothesis is 1-0.5*P(x!=5) e.g. P(x<5)= 1-P(X>5).

It is referred to the second part of the question,namely to "And to get back to two-sided test, I multiply by two the combined p values.", isn't it? In this paper I found following:

After combining the P-values, if desired the resulting combined P can be again converted to a two-tailed test by multiplying it by two.

Or do you mean I need to take the log2FC to account for the direction? It the gene is up or down regulated?

combine.test() implements Fisher's method and Stouffer's method to combine p-values. Fisher's statistic follows a chi-square distribution which is not symmetric so you can't multiply the resulting p-value by 2 in this case. With Stouffer's method, you can multiply the resulting p-value by 2 because you're dealing with a symmetric distribution (the Z transform statistic follows a normal distribution).

Yes, that's exactly what I'm referring to, since there are two one-sided p-values, depending on the alternative hypothesis in question.

So you'd need to decide which side to take.

If we divide a two tailed p-value from DESeq2 in two, are we thereby selecting the one-tailed p-value that corresponds to the alternative hypothesis of gene expression changing in the direction it did? Is this appropriate, selecting the alternative hypotheses that relate to the direction of change?

I am also trying to combine p values from multiple independent RNASeq datasets and would like to use Stouffer's method, but want to be sure of using the correct source of p-values.