Question: Best Way To Compare Impact Of Mutation On *Relative* Difference In Gene Expression Between Two Groups?
1
gravatar for Charles Warden
6.0 years ago by
Charles Warden6.6k
Duarte, CA
Charles Warden6.6k wrote:

I have a gene expression data set with the following features:

1) mutation status (binary variable: mutant vs. WT)

2) Grouping (Group1, Group 2)

The question is whether the mutation leads to an increase in difference in gene expression between patients in Group 2 vs. Group 1.

I have thought about doing t-tests. In a clear example, mutant-Group1 vs. mutant-Group2 would be significant and WT-Group1 vs WT-Group2 shouldn't not be significant. However, I can think of examples that would be problematic (marginally not significant results, two significant results with very different p-values, etc.).

I have thought about simulating a null distribution for the difference between 2 pairs of random data sets, but is there a more straightforward method for analysis? For example, does the difference between t-test statistics also follow a normal distribution (it seems like that could prioritize genes of interest)? Likewise, is there a single test that can be used (instead of two separate tests)?

gene expression statistics • 3.4k views
ADD COMMENTlink modified 5.6 years ago by Biostar ♦♦ 20 • written 6.0 years ago by Charles Warden6.6k
3
gravatar for David W
6.0 years ago by
David W4.7k
New Zealand
David W4.7k wrote:

Sounds like you want a two way ANOVA, including an interaction between group and genotype. (Don't let the fact you only have two-levels per factor throw you, t-tests are really just a special case of ANOVA anyway)

EDIT

When I think about these sorts of problems I like to make up some fake data to help me understand the hypothesis I'm testing and the what the test is likely to return. Is this about what you are looking for?

Set up our catergoriacal values

 df <- data.frame(genotype = rep(c("+", "-"),100),
                  grp = rep(c("A", "B"), each = 100))

Fake data such that there is an effect for genotype only in group "B "

fake_data <- function(genotype,grp){
  if(genotype=="+" & grp=="B"){
    y = rnorm(1,2,1) 
  } else{
    y = rnorm(1,1,1)
 }
 return(y)
}

df$y <- apply(df, 1, function(x) fake_data(x[1],x[2]))

Now, run models with and without an interaction, and compare them with your favorite framework:

modInteract <- lm(y ~ grp*genotype, data=df)
modEven <- lm(y ~ grp+genotype, data=df)
anova(modInteract, modEven)
AIC(modInteract, modEven)
summary(modInteract)

Since the effect of genotype "+" is only apparent in grp "B", the model with the interaction is the best-fitting.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by David W4.7k

Thanks for your suggestion.

I thought about doing a 2-way ANOVA, but I didn't think it was quite the right test. More specifically, I would use 2-way ANOVA to try and factor out co-dependence between variables (like group + technical batch, or group + sample pairing).

So, if I wanted to test if expression varied with group in a way that is independent of mutation status, I think 2-way ANOVA would be the right way to go. However, I want to ask if I can observe a greater difference in expression between groups if I consider mutation status (which I don't think is ideal for 2-way ANOVA). Would you agree?

ADD REPLYlink written 6.0 years ago by Charles Warden6.6k
1

Just to clarify, what you want to do is what David W suggested (though use "modEven <- lm(y~grp+grp:genotype, data=df)" instead). The ANOVA (or, more simply, linear model) can give you the genotype effect while controlling for a mutation effect.

ADD REPLYlink written 6.0 years ago by Devon Ryan90k

So, if I'm following you it seems you want to run a 2-way ANOVA that contains and interaction, and you are most interested in the signficance/magnitude of that interaction . Does the toy-dataset I've now added to my answer fit with what you are trying to do?

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by David W4.7k

Thank you both for your help - I hadn't thought about analyzing the data this way.

I apologize if I am missing something, but I'm still not 100% certain if this addresses my specific question.

For example, how can I use the result to determine the nature of the interaction? How can I tell if the mutation enhanced the up-regulation, enhances down-regulation, antagonizes up-regulation, or antagonizes down-regulation? Are there no other possible causes for the interaction term to improve the model fitting?

For example, if the fold-change for WT-Group1 vs. WT-Group2 is 2.5, the biological interpretation would be different if the fold-change for mutant-Group1 vs. mutant-Group2 is 1.0 (no-difference --> mutant nullifies up-regulation in Group2) versus 4.0 (higher-up regulation --> mutant enhances up-regulation in Group2).

EDIT

If I use this as a follow-up to the initial pair-wise analysis, I think that should provide a complete analysis that I would be satisifed with. If any one else has any suggestions, feel free to provide them. However, this is the best answer that I have seen so far.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Charles Warden6.6k
0
gravatar for ewre
6.0 years ago by
ewre220
United States
ewre220 wrote:

MANCOVA? it can be used to detect if there can be a interaction or relationship between group and mutation. In your case u have just 2 variables, a co-variant analysis may be enough. very sorry for my mistake, MANCOVA not manova.

ADD COMMENTlink modified 5.9 years ago • written 6.0 years ago by ewre220

MANOVA is _Multivariate_ analysis of variance - it's for cases in which you have multiple response variables. I don't that's what the OP is looking for?

ADD REPLYlink written 6.0 years ago by David W4.7k

Thanks for the suggestion.

I had forgotten about MANOVA - I will have to take a look at it more closely, but I would initially agree with David W's response.

My main concern that I had after briefly reviewing the method was the interpretation of the results. For example, I know how to define an interaction term in a linear regression model (which would define the 4 groups defined by group x mutation), but I don't want to consider each interaction group equally. Instead, I have a specific hypothesis about how two pairs of interaction terms (mutant-Group1 vs. mutant-Group2, and WT-Group1 vs WT-Group2) compare to one another. I don't currently see how the MANOVA result will answer this specific hypothesis.

ADD REPLYlink written 6.0 years ago by Charles Warden6.6k

Also, I think MANOVA assumes some sort of paring between the dependent variables - there is no pairing between WT and mutant samples. In fact, there are more mutants than WTs (in my specific case), so the dependent arrays would be different lengths.

ADD REPLYlink written 6.0 years ago by Charles Warden6.6k

have a consideration on co-variant analysis, sorry for my mistake.

ADD REPLYlink written 5.9 years ago by ewre220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour