Question

p-value calculation when comparing ratios of means (not means of ratios)

1

Entering edit mode

7.2 years ago

wildtype ▴ 60

Hello, I apologize in advance for the dumb question but I am no expert in statistics, my course only covered how to do student's t.test pretty much. I could not find information in different forums on a similar problem specifically. I am not sure what to do in the following situation. Suppose we measure the expression of geneX by RT-qPCR in two different cell lines, each of which is treated with some drug or not treated. They are independent, so it is not the same cells before an after treatment; it is different cells that are either treated or not. Suppose we have 3 biological replicates for each condition, and the following values

Cell line A, no treatment: c(0.9,1,1.1) => mean expression 1 +/- 0.1 standard deviation (let's call this A1) Cell line A, treatment: c(9,10,11) => mean expression 10 +/- 1 sd (A2) Cell line B: no treatment: c(2.9,3,3.1) => mean expression 3 +-/ 0.1 sd (B1) Cell line B: treatment: c(290,300,310) => mean expression 300 +-/ 10 sd (B2)

Obviously, the drug causes an increase in both cell lines, but in line A there is a 10-fold increase (A2/A1=10/1) while in line B there is a 100 fold increase (B2/B1=300/3). I want to claim that the effect in B is much stronger.

What statistical test should we use to tell whether the RATIOS (gene expression upon treatment normalized to basal gene expression without treatment) are significantly different i.e. A2/A1 +/- propagated error != B2/B1 +/- error

Thanks in advance

statistics • 3.2k views

ADD COMMENT • link updated 7.0 years ago by ejm32 ▴ 450 • written 7.2 years ago by wildtype ▴ 60

score 1 · Answer 1 · 2018-06-29

Hi

Since you are interested in asking the question I want to claim that the effect in B is much stronger, your

Null hypothesis can be: Fold change in cell line A is <= B
Alternate hypothesis can be: Fold change in cell line A is > B

Now you will do the testing. Since there are 3 replicates, n=3, you will have to go for t-test with 2 degree of freedom (mark that you have less power). Also since before and after are not paired, you should go for unpaired t test.

Before doing you have to calculate fold change e.g. A1 with A2 and B1 with B2, so you will have 3 fold change values

a1 = c(0.9,1,1.1)
a2 = c(9,10,11)
b1 = c(2.9,3,3.1)
b2 = c(290,300,310)

a_diff = a2/a1
b_diff = b2/b1

t.test(a_diff, b_diff, alternative = "greater")

In the example above, there is 10 fold change in every comparison, so t test would give error. Why it will give error, refer this link

Hope this helps

score 1 · Answer 2 · 2018-06-29

In this case I think the best thing to is make use of linear models, which will allow you to perform contrasts based on the comparisons you care about. Have a look at the limma manual for a bit more information.

library(limma)
a1 <-  c(0.9,1,1.1)
a2 <-  c(9,10,11)
b1 <-  c(2.9,3,3.1)
b2 <-  c(290,300,310)

m <- data.frame(a1,a2,b1,b2, stringsAsFactors = F)
m <- reshape2::melt(m)

mm <- model.matrix( ~ 0 + m$variable)
colnames(mm) <- levels(m$variable)
res <- lmFit(t(log2(m[2])), mm) #If your values are already log transformed remove the log2 call

cm <- makeContrasts(A=a2-a1,
                    B=b2-b1,
                    Diff = (b2-b1)-(a2-a1), levels = levels(m$variable))

res2 <- contrasts.fit(res, cm)
res3 <- eBayes(res2)
topTable(res3, coef = 3 ,number = 1)

#              logFC  AveExpr           t      P.Value    adj.P.Val         B
# value   3.32192809 3.281243 27.82837605 6.974347e-10 2.789739e-09 13.450186

score 0 · Answer 3 · 2018-05-22

Hi, Here I showed the crude way to calculate the probability from normal distribution using given information. I am not very much familiar with various statistical tests. There must be sophisticated ways to get p-values than what i mentioned here.

## generate populations from given information
a1 <- rnorm(n =1000 , mean = 1, 0.1)
a2 <- rnorm(n =1000 , mean = 10, 1)
b1 <- rnorm(n =1000 , mean = 3, 0.1)
b2 <- rnorm(n =1000 , mean = 300, 10)

## get distribution of ratios. You can change number of iterations 
n_iter <- 1000
ratios <- sapply(seq(n_iter), function(i){

        ## select 3 random from each populations  

        samp_a1  <- mean(sample(a1, 3))
        samp_a2  <- mean(sample(a1, 3))
        samp_b1  <- mean(sample(b1, 3))
        samp_b2 <- mean(sample(b2, 3))


        r1 <- sampa2 /sampa1  ## A2/A1

        r2 <- sampb2 /sampb1 ## B2/B1 

        r_ratios <- r2/r1  ## Treatment / WO Treatment

        return(r_diff)        
})

## distribution of ratios 
plot(density(ratios))

## zscore 
zscore <-  (1 -  mean(ratios)) / sd(ratios)

## probability of normal dist   
pnorm(zscore)