What is the statistics test on proportion data
1
0
Entering edit mode
3.6 years ago
star ▴ 320

Hi,

I would like to perform a statistical test to see whether there is any significant differences between proportions in three different groups (G1, G2, G3) among different runs. The ID refers to different subjects. The sum of G1, G2 and G3 is always 1 as they are proportions.

I’m interested in comparing different samples regarding the proportions in different groups (comparing the rows).

{r} data <- data.frame(ID=rep(paste0("ID", 1:3), 3), runs = rep(c("run1","run2","run3"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15, 0.35, 0.43, 0.3, 0.2, 0.33)) 

I really appreciate if you can help me to find a proper statistical test.

R lme glmer statistics • 1.1k views
0
Entering edit mode

What did you measure?

0
Entering edit mode

It's not my own data. I only know it's a kind of classification measurement that shows the proportion of cells that clustered together in each sample.

0
Entering edit mode

deleting a post after getting a satisfactory answer is grounds for suspension!

0
Entering edit mode

The reason for deleting was an incorrect question! I was not interested in the differences between groups (G1, G2 and G3). I've deleted it to post the correct question!

2
Entering edit mode

ok but look someone took the effort to answer your question. You should thank them and leave it be. It is still the correct answer to the question and we need to honor the effort that goes into answering questions.

2
Entering edit mode
3.6 years ago

Because you have animal groups, tissues and multiple experiments, I would recommend modelling this using a linear model and treating the animals and the tissue as a fixed effect. There is some background you'll have to pick up, but here's a stub to get you started/thinking about analyzing this:

library(reshape2);

## Create dataframe
df <- data.frame(ID=rep(paste0("ID", 1:3), 3), tissue = rep(c("liver","brain","heart"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15\
, 0.35, 0.43, 0.3, 0.2, 0.33))

# Turn into a molten dataframe
df.molten = melt(df)

## Model data set
model.lm = as.formula("value ~ variable + ID + tissue")
df.lm = lm(data = df.molten, model.lm)

## Explore results
summary(df.lm)

# Move G3 to the front of factor values to change treatment group.
df.molten$variable = factor(df.molten$variable, c("G3", setdiff(as.character(df.molten\$variable), "G3")))
df.lm = lm(data = df.molten, model.lm)
summary(df.lm)


And the results for these summary(df.lm):

Call:
lm(formula = model.lm, data = df.molten)

Residuals:
Min       1Q   Median       3Q      Max
-0.13667 -0.04667 -0.02444  0.07333  0.16111

Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.667e-01  3.609e-02  12.932 9.33e-12 ***
variableG2  -2.022e-01  3.953e-02  -5.115 3.99e-05 ***
variableG3  -1.978e-01  3.953e-02  -5.003 5.23e-05 ***
IDID2        3.036e-18  3.953e-02   0.000        1
IDID3       -3.925e-17  3.953e-02   0.000        1
tissueheart         NA         NA      NA       NA
tissueliver         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.08386 on 22 degrees of freedom
Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573


and

 Call:
lm(formula = model.lm, data = df.molten)

Residuals:
Min       1Q   Median       3Q      Max
-0.13667 -0.04667 -0.02444  0.07333  0.16111

Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.689e-01  3.609e-02   7.451 1.88e-07 ***
variableG1   1.978e-01  3.953e-02   5.003 5.23e-05 ***
variableG2  -4.444e-03  3.953e-02  -0.112    0.912
IDID2       -4.626e-17  3.953e-02   0.000    1.000
IDID3       -2.453e-17  3.953e-02   0.000    1.000
tissueheart         NA         NA      NA       NA
tissueliver         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.08386 on 22 degrees of freedom
Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573


In the first comparison, you compare G1 to G2 and G3. The p-value for the coefficient variableG2, variableG3 seem to indicate that the difference between G1 and G2 or G1 and G3 is significant.

In the second comparison, you switch G3 for your treatment group. The coefficient variableG1 is equivalent to the previous comparison's variableG3, so it makes sense that you get the same p-value. However you see here that G3 v.s. G2 is not significantly different (p=0.912).

Try to understand what's going on (starting with the basic of linear regression if you're not familiar with the method) and try to understand the outputs of the model before you use this in any serious analysis.

0
Entering edit mode

1
Entering edit mode

Please remember to up-vote and / or accept answers that have helped.