Question: What is the statistics test on proportion data
0
4 months ago by
star190
Netherlands
star190 wrote:

Hi,

I would like to perform a statistical test to see whether there is any significant differences between proportions in three different groups (G1, G2, G3) among different runs. The ID refers to different subjects. The sum of G1, G2 and G3 is always 1 as they are proportions.

I’m interested in comparing different samples regarding the proportions in different groups (comparing the rows).

```{r} data <- data.frame(ID=rep(paste0("ID", 1:3), 3), runs = rep(c("run1","run2","run3"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15, 0.35, 0.43, 0.3, 0.2, 0.33)) ```

I really appreciate if you can help me to find a proper statistical test.

statistics glmer R lme • 267 views
modified 4 months ago • written 4 months ago by star190

What did you measure?

It's not my own data. I only know it's a kind of classification measurement that shows the proportion of cells that clustered together in each sample.

deleting a post after getting a satisfactory answer is grounds for suspension!

ADD REPLYlink written 4 months ago by Istvan Albert ♦♦ 81k

The reason for deleting was an incorrect question! I was not interested in the differences between groups (G1, G2 and G3). I've deleted it to post the correct question!

2

ok but look someone took the effort to answer your question. You should thank them and leave it be. It is still the correct answer to the question and we need to honor the effort that goes into answering questions.

ADD REPLYlink written 4 months ago by Istvan Albert ♦♦ 81k
2
4 months ago by

Because you have animal groups, tissues and multiple experiments, I would recommend modelling this using a linear model and treating the animals and the tissue as a fixed effect. There is some background you'll have to pick up, but here's a stub to get you started/thinking about analyzing this:

``````library(reshape2);

## Create dataframe
df <- data.frame(ID=rep(paste0("ID", 1:3), 3), tissue = rep(c("liver","brain","heart"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15\
, 0.35, 0.43, 0.3, 0.2, 0.33))

# Turn into a molten dataframe
df.molten = melt(df)

## Model data set
model.lm = as.formula("value ~ variable + ID + tissue")
df.lm = lm(data = df.molten, model.lm)

## Explore results
summary(df.lm)

# Move G3 to the front of factor values to change treatment group.
df.molten\$variable = factor(df.molten\$variable, c("G3", setdiff(as.character(df.molten\$variable), "G3")))
df.lm = lm(data = df.molten, model.lm)
summary(df.lm)
``````

And the results for these `summary(df.lm)`:

``````Call:
lm(formula = model.lm, data = df.molten)

Residuals:
Min       1Q   Median       3Q      Max
-0.13667 -0.04667 -0.02444  0.07333  0.16111

Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.667e-01  3.609e-02  12.932 9.33e-12 ***
variableG2  -2.022e-01  3.953e-02  -5.115 3.99e-05 ***
variableG3  -1.978e-01  3.953e-02  -5.003 5.23e-05 ***
IDID2        3.036e-18  3.953e-02   0.000        1
IDID3       -3.925e-17  3.953e-02   0.000        1
tissueheart         NA         NA      NA       NA
tissueliver         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.08386 on 22 degrees of freedom
Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573
``````

and

`````` Call:
lm(formula = model.lm, data = df.molten)

Residuals:
Min       1Q   Median       3Q      Max
-0.13667 -0.04667 -0.02444  0.07333  0.16111

Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.689e-01  3.609e-02   7.451 1.88e-07 ***
variableG1   1.978e-01  3.953e-02   5.003 5.23e-05 ***
variableG2  -4.444e-03  3.953e-02  -0.112    0.912
IDID2       -4.626e-17  3.953e-02   0.000    1.000
IDID3       -2.453e-17  3.953e-02   0.000    1.000
tissueheart         NA         NA      NA       NA
tissueliver         NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.08386 on 22 degrees of freedom
Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573
``````

In the first comparison, you compare G1 to G2 and G3. The p-value for the coefficient variableG2, variableG3 seem to indicate that the difference between G1 and G2 or G1 and G3 is significant.

In the second comparison, you switch G3 for your treatment group. The coefficient variableG1 is equivalent to the previous comparison's variableG3, so it makes sense that you get the same p-value. However you see here that G3 v.s. G2 is not significantly different (p=0.912).

Try to understand what's going on (starting with the basic of linear regression if you're not familiar with the method) and try to understand the outputs of the model before you use this in any serious analysis.

1

Please remember to up-vote and / or accept answers that have helped.