Question: regression model to use to show the difference
gravatar for krushnach80
2.1 years ago by
krushnach80690 wrote:

I have wild type and knockout condition, after knockout the level of a certain metabolite goes up, there is difference, as seen in the phenotype,so my question is what kind of regression model to use or any other method to show the difference any suggestion or help would be appreciated

R • 725 views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe56k • written 2.1 years ago by krushnach80690

If you have two groups, have you considered a t-test?

ADD REPLYlink written 2.1 years ago by Sean Davis26k
WT   Amo                   GS   Amo
6.92    461.333           6.12  408.000
6.9 460.000         6.98    465.333
18.8    1253.333              12.69 846.000
18.75   1250.000            10.8    720.000
33.36   2224.000        11.2    746.667
21.55   1436.667        11.82   788.000
21.95   1463.333        22.96   1530.667
11.54   769.333     28.41   1894.000
5.22    348.000     47.7    3180.000
16.1    1073.333        3.28    218.667
13.41   894.000     14.2    946.667
31  2066.667        17  1133.333
55  3666.667        25  1666.667
53.4    3560.000        40.2    2680.000
                           53   3533.333
                           41   2733.333

my data is something like this...i mean my number of observation in WT is more than in knockout...would you suggest me to go for t test...

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by krushnach80690

There is no need for the two groups to be the same size for a t-test.

ADD REPLYlink written 2.1 years ago by Sean Davis26k

okay but these are not independent upto my understanding because for the im taking the knock out of the same gene which im studying am i correct, if yes then i shall go for Paired t-Test isn;t it?

ADD REPLYlink written 2.1 years ago by krushnach80690
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

Hello friend,

Assuming that your metabolites have been normalised to the Z-scale and/or are logged (and thus follow a normal distribution), you can just run a binary logistic regression model:

First, get your data in this format:

           Group   Metab1  Metab2 Metab3 Metab3
Sample 1   WT      11.39   10.62   9.75  10.34
Sample 2   WT      10.16    8.63   8.68   9.08
Sample 3   WT       9.29   10.24   9.89  10.11
Sample 4   KO      11.53    9.22   9.35   9.13
Sample 5   KO       8.35   10.62  10.25  10.01
Sample 6   KO      11.71   10.43   8.87   9.44

Then, set your Group variable as factors and specify WT as the reference level:

MyData$Group <- factor(MyData$Group, levels=c("WT","KO"))

Then, I would check each metabolite independently in the logistic regression modelling:

glm(Group ~ Metab1, family="binomial")
glm(Group ~ Metab2, family="binomial")
et cetera

Model p-values, estimates (indicates which way the metabolite expression goes in KO vs WT) / coefficients can be extracted via the summary() funcion applied to the model object. You can also perform Chi-squared ANOVA via anova(MyModel, test="Chisq")

You can set this up as a loop: Question about generalized linear model fitting


If your aim is to identify a panel of predictors, then, from the results of the above, select the metabolites that are statistically significant and then you will have to perform further test statistics on these to gauge their 'predictive' strength. For example, see:

You can also just do a penalised regression with all metabolites at the same time using the lasso, elastic-net, or ridge penalty: A: How to exclude some of breast cancer subtypes just by looking at gene expressio


ADD COMMENTlink modified 14 months ago • written 2.1 years ago by Kevin Blighe56k

kevin thank you very much i was looking for this for my other really really glad that you posted this ...

ADD REPLYlink written 2.1 years ago by krushnach80690

@Kevin can i use your method for the gene expression or differentially expressed genes so the only thing i need is to model my data as you have mentioned ?

ADD REPLYlink written 2.1 years ago by krushnach80690

Yes, you can use this same approach for the genes that are differentialy expressed so that you can further reduce the number of genes in your final model.

ADD REPLYlink written 2.1 years ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1977 users visited in the last hour