Question: which statistical test use for differential expression with qPCR
1
alvarocentron9110 wrote:

Hello everyone, this question probably will sound dumb, but here it goes.

I have performed qPCR for a gene in wt and si, I have 3 biological replicates with 3 technical replicates each, I already have calculated the ddCT and the errors for both technical and biological replicates, but now I'm struggling figuring out how to see if there are differences between the wt and the si.

I think my data is paired (same cell line and gene, but with different treatment), and with only 3 biological replicates I suppose my data is not parametric, so I decided to use a Wilcoxon test. And here comes my question:

Should I use all the 9 values (3 technical x 3 biological)? If I'm only using the biological replicates should I take into account the technical error? (if yes, how I do that?) and finally, with such small n, can I use this statistical test?

Thank you!

real time pcr • 455 views
modified 10 months ago by ATpoint29k • written 10 months ago by alvarocentron9110
2
ATpoint29k wrote:

First of all I am not a statistician and my comment reflects my amateur understanding of statistics: For the technical replicates I would take the mean value as technical replicates are intended to check for pipetting and measurement errors. If the standard deviation is reasonable, average the result and use this value for the further statistics. For the sample size, the significances of Wilcoxon tests are limited by the replicate number. That means that even if all values of group 1 were smaller than all values of group 2, with n=3 there is a minimal p-value you can get with this experimental setup.

E.g. in R:

``````wilcox.test(x = c(1,2,3), y=c(10,11,12))

Wilcoxon rank sum test

data:  c(1, 2, 3) and c(10, 11, 12)
W = 0, p-value = 0.1
alternative hypothesis: true location shift is not equal to 0
``````

gives the same result as

``````wilcox.test(x = c(1,2,3), y=c(100,111,122))

Wilcoxon rank sum test

data:  c(1, 2, 3) and c(100, 111, 122)
W = 0, p-value = 0.1
alternative hypothesis: true location shift is not equal to 0
``````

If you want a smaller `p` you have to increase the replicate number. Use a power test to calculate the necessary replicate number at a given variance. For n=4 in one condition it would e.g. be

``````wilcox.test(x = c(1,2,3), y=c(100,111,122, 133))

Wilcoxon rank sum test

data:  c(1, 2, 3) and c(100, 111, 122, 133)
W = 0, p-value = 0.05714
alternative hypothesis: true location shift is not equal to 0
``````

..and so on.

As in your case I would use an unpaired Wilcoxon (Mann-Whitney U) test as from what I understand your samples are independent of each other. Paired designs would be to measure the gene e.g. in an aliquot of an inducible cell line at time 0h and measure again after stimulating the remaining aliquot of same cell line for 24h. In your case you have independent cells so measurements do not influence each other.

Thank you very much, and one last question, when I graph my data, I saw differences, the problem is that when I do the nonparametric test as you said I don't see differences, however, when I do a parametric test I do. I'm not sure if I will be allowed to do a 4th or 5th replicate since I have several targets and samples, so I don't know if, is it correct/feasible to assume normality and homogeneity and go for the parametric test?

1

I do not know if this would statistically be correct but in the literature people use t-tests all the time for qPCR data so you probably can come away with it.

1

I've seen a Welch's t-test used pretty consistently in the literature for qPCRs, my understanding is that it's more robust than a Student's t-test. Specifically, my PhD lab would perform this test on the ddCT values rather than the fold changes. The idea being that these are the directly measured values and already exist in log-space (the ddCT is actually the equivalent of the log2FC).

I'll also include the disclaimer that I'm not a statistician and took that advice from other lab members at the time.