Question: Volcano plot: why is there big FC with big p-values?
0
15 months ago by
i.am.filippov0 wrote:

I'm looking at tutorial about analysing differential expression from microarray data. `limma` is used to detect differentially expressed genes.

Now if you look at:

the bigger log fold change corresponds to smaller p-value, i.e. bigger FC is more significant. But why would different genes at the same FC level have different p-values? How is the big spread explained? Does this question make sense?

Thanks!

R gene • 1.0k views
modified 15 months ago by Makplus T70 • written 15 months ago by i.am.filippov0
1

Two simple explanations are the larger within-treatment variances (e.g. counts for four treatment 1 samples are 2,2,2,2; and counts for four treatment 2 samples are 8,0,0,0), or differences in counts (e.g. 1/2 or 100/200).

9
15 months ago by
ATpoint44k
ATpoint44k wrote:

The smaller the counts of a gene (or whatever you measure) are, the more unreliable they are and the more prone these counts are to show large fold changes.

Lets have an example:

A gene had 10 counts in sampleA and 2 counts in sampleB. Makes a fold change of 5 right? Say another gene had 1000 counts in A and 200 in B, also FC = 5. Which is more reliable: I would say the second one. Imagine you have small fluctuations of the counts because of the inherent uncertainly / error rate of sequencing and the quantification method. Say the gene now had only 5 counts in A and 4 in B, FC is now 1.25 instead of 5. If the second gene had the same fluctuation so 995 in A and 202 in B, the FC is now 4,925742574257426, so still very close to 5. The high counts are more resistent to little fluctuations. => If the mean (so the average counts for the genes) is low, the fold changes are high (but unreliable). As far as I know this holds true for every kind of experiment in which quantities are measured.

Long story short: Low counts tend to show artificially high (and often false) fold changes, therefore the confidence in them is low and therefore p-values tend to be large. You would need more replicates to have the power to detect differential genes with low counts compared to genes with high counts. That is why statistical power is inherently greater for highly-expressed than lowly-expressed genes.

2
15 months ago by
Makplus T70
Makplus T70 wrote:

It seems you have the idea that bigger Fold-change expect to smaller p-value.
But P-value and Fold-change are not necessarily related, fold change just reflects mean change, then P-value is not only depended by mean but also variance. (for example, if you perform the two sample students t-test )