Question: What is the statistics test on proportion data
0
gravatar for star
4 months ago by
star190
Netherlands
star190 wrote:

Hi,

I would like to perform a statistical test to see whether there is any significant differences between proportions in three different groups (G1, G2, G3) among different runs. The ID refers to different subjects. The sum of G1, G2 and G3 is always 1 as they are proportions.

I’m interested in comparing different samples regarding the proportions in different groups (comparing the rows).

{r} data <- data.frame(ID=rep(paste0("ID", 1:3), 3), runs = rep(c("run1","run2","run3"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15, 0.35, 0.43, 0.3, 0.2, 0.33))

I really appreciate if you can help me to find a proper statistical test.

statistics glmer R lme • 267 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by star190

What did you measure?

ADD REPLYlink written 4 months ago by ATpoint23k

It's not my own data. I only know it's a kind of classification measurement that shows the proportion of cells that clustered together in each sample.

ADD REPLYlink written 4 months ago by star190

deleting a post after getting a satisfactory answer is grounds for suspension!

ADD REPLYlink written 4 months ago by Istvan Albert ♦♦ 81k

The reason for deleting was an incorrect question! I was not interested in the differences between groups (G1, G2 and G3). I've deleted it to post the correct question!

ADD REPLYlink written 4 months ago by star190
2

ok but look someone took the effort to answer your question. You should thank them and leave it be. It is still the correct answer to the question and we need to honor the effort that goes into answering questions.

ADD REPLYlink written 4 months ago by Istvan Albert ♦♦ 81k
2
gravatar for manuel.belmadani
4 months ago by
Canada
manuel.belmadani1.1k wrote:

Because you have animal groups, tissues and multiple experiments, I would recommend modelling this using a linear model and treating the animals and the tissue as a fixed effect. There is some background you'll have to pick up, but here's a stub to get you started/thinking about analyzing this:

library(reshape2);

## Create dataframe                                                                                                                                                                                                                            
df <- data.frame(ID=rep(paste0("ID", 1:3), 3), tissue = rep(c("liver","brain","heart"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15\
, 0.35, 0.43, 0.3, 0.2, 0.33))

# Turn into a molten dataframe                                                                                                                                                                                                                 
df.molten = melt(df)

## Model data set                                                                                                                                                                                                                              
model.lm = as.formula("value ~ variable + ID + tissue")
df.lm = lm(data = df.molten, model.lm)

## Explore results                                                                                                                                                                                                                             
summary(df.lm)

# Move G3 to the front of factor values to change treatment group.                                                                                                                                                                              
df.molten$variable = factor(df.molten$variable, c("G3", setdiff(as.character(df.molten$variable), "G3")))
df.lm = lm(data = df.molten, model.lm)
summary(df.lm)

And the results for these summary(df.lm):

Call:
 lm(formula = model.lm, data = df.molten)

 Residuals:
      Min       1Q   Median       3Q      Max
 -0.13667 -0.04667 -0.02444  0.07333  0.16111

 Coefficients: (2 not defined because of singularities)
               Estimate Std. Error t value Pr(>|t|)
 (Intercept)  4.667e-01  3.609e-02  12.932 9.33e-12 ***
 variableG2  -2.022e-01  3.953e-02  -5.115 3.99e-05 ***
 variableG3  -1.978e-01  3.953e-02  -5.003 5.23e-05 ***
 IDID2        3.036e-18  3.953e-02   0.000        1
 IDID3       -3.925e-17  3.953e-02   0.000        1
 tissueheart         NA         NA      NA       NA
 tissueliver         NA         NA      NA       NA
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.08386 on 22 degrees of freedom
 Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
 F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573

and

 Call:
 lm(formula = model.lm, data = df.molten)

 Residuals:
      Min       1Q   Median       3Q      Max
 -0.13667 -0.04667 -0.02444  0.07333  0.16111

 Coefficients: (2 not defined because of singularities)
               Estimate Std. Error t value Pr(>|t|)
 (Intercept)  2.689e-01  3.609e-02   7.451 1.88e-07 ***
 variableG1   1.978e-01  3.953e-02   5.003 5.23e-05 ***
 variableG2  -4.444e-03  3.953e-02  -0.112    0.912
 IDID2       -4.626e-17  3.953e-02   0.000    1.000
 IDID3       -2.453e-17  3.953e-02   0.000    1.000
 tissueheart         NA         NA      NA       NA
 tissueliver         NA         NA      NA       NA
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.08386 on 22 degrees of freedom
 Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
 F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573

In the first comparison, you compare G1 to G2 and G3. The p-value for the coefficient variableG2, variableG3 seem to indicate that the difference between G1 and G2 or G1 and G3 is significant.

In the second comparison, you switch G3 for your treatment group. The coefficient variableG1 is equivalent to the previous comparison's variableG3, so it makes sense that you get the same p-value. However you see here that G3 v.s. G2 is not significantly different (p=0.912).

Try to understand what's going on (starting with the basic of linear regression if you're not familiar with the method) and try to understand the outputs of the model before you use this in any serious analysis.

ADD COMMENTlink written 4 months ago by manuel.belmadani1.1k

Thanks for your answer!

ADD REPLYlink written 4 months ago by star190
1

Please remember to up-vote and / or accept answers that have helped.

ADD REPLYlink written 3 months ago by Kevin Blighe48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour