Question

Best Way To Compare Impact Of Mutation On *Relative* Difference In Gene Expression Between Two Groups?

1

Entering edit mode

10.9 years ago

Charles Warden 8.2k

I have a gene expression data set with the following features:

1) mutation status (binary variable: mutant vs. WT)

2) Grouping (Group1, Group 2)

The question is whether the mutation leads to an increase in difference in gene expression between patients in Group 2 vs. Group 1.

I have thought about doing t-tests. In a clear example, mutant-Group1 vs. mutant-Group2 would be significant and WT-Group1 vs WT-Group2 shouldn't not be significant. However, I can think of examples that would be problematic (marginally not significant results, two significant results with very different p-values, etc.).

I have thought about simulating a null distribution for the difference between 2 pairs of random data sets, but is there a more straightforward method for analysis? For example, does the difference between t-test statistics also follow a normal distribution (it seems like that could prioritize genes of interest)? Likewise, is there a single test that can be used (instead of two separate tests)?

gene expression statistics • 5.2k views

ADD COMMENT • link updated 10.6 years ago by Biostar 20 • written 10.9 years ago by Charles Warden 8.2k

score 3 · Answer 1 · 2013-06-13

3

Entering edit mode

10.9 years ago

David W 4.9k

Sounds like you want a two way ANOVA, including an interaction between group and genotype. (Don't let the fact you only have two-levels per factor throw you, t-tests are really just a special case of ANOVA anyway)

EDIT

When I think about these sorts of problems I like to make up some fake data to help me understand the hypothesis I'm testing and the what the test is likely to return. Is this about what you are looking for?

Set up our catergoriacal values

 df <- data.frame(genotype = rep(c("+", "-"),100),
                  grp = rep(c("A", "B"), each = 100))

Fake data such that there is an effect for genotype only in group "B "

fake_data <- function(genotype,grp){
  if(genotype=="+" & grp=="B"){
    y = rnorm(1,2,1) 
  } else{
    y = rnorm(1,1,1)
 }
 return(y)
}

df$y <- apply(df, 1, function(x) fake_data(x[1],x[2]))

Now, run models with and without an interaction, and compare them with your favorite framework:

modInteract <- lm(y ~ grp*genotype, data=df)
modEven <- lm(y ~ grp+genotype, data=df)
anova(modInteract, modEven)
AIC(modInteract, modEven)
summary(modInteract)

Since the effect of genotype "+" is only apparent in grp "B", the model with the interaction is the best-fitting.

ADD COMMENT • link 10.9 years ago by David W 4.9k

0

Entering edit mode

Thanks for your suggestion.

I thought about doing a 2-way ANOVA, but I didn't think it was quite the right test. More specifically, I would use 2-way ANOVA to try and factor out co-dependence between variables (like group + technical batch, or group + sample pairing).

So, if I wanted to test if expression varied with group in a way that is independent of mutation status, I think 2-way ANOVA would be the right way to go. However, I want to ask if I can observe a greater difference in expression between groups if I consider mutation status (which I don't think is ideal for 2-way ANOVA). Would you agree?

ADD REPLY • link 10.9 years ago by Charles Warden 8.2k

1

Entering edit mode

Just to clarify, what you want to do is what David W suggested (though use "modEven <- lm(y~grp+grp:genotype, data=df)" instead). The ANOVA (or, more simply, linear model) can give you the genotype effect while controlling for a mutation effect.

ADD REPLY • link 10.9 years ago by Devon Ryan 104k

0

Entering edit mode

So, if I'm following you it seems you want to run a 2-way ANOVA that contains and interaction, and you are most interested in the signficance/magnitude of that interaction . Does the toy-dataset I've now added to my answer fit with what you are trying to do?

ADD REPLY • link 10.9 years ago by David W 4.9k

0

Entering edit mode

Thank you both for your help - I hadn't thought about analyzing the data this way.

I apologize if I am missing something, but I'm still not 100% certain if this addresses my specific question.

For example, how can I use the result to determine the nature of the interaction? How can I tell if the mutation enhanced the up-regulation, enhances down-regulation, antagonizes up-regulation, or antagonizes down-regulation? Are there no other possible causes for the interaction term to improve the model fitting?

For example, if the fold-change for WT-Group1 vs. WT-Group2 is 2.5, the biological interpretation would be different if the fold-change for mutant-Group1 vs. mutant-Group2 is 1.0 (no-difference --> mutant nullifies up-regulation in Group2) versus 4.0 (higher-up regulation --> mutant enhances up-regulation in Group2).

EDIT

If I use this as a follow-up to the initial pair-wise analysis, I think that should provide a complete analysis that I would be satisifed with. If any one else has any suggestions, feel free to provide them. However, this is the best answer that I have seen so far.

ADD REPLY • link 10.9 years ago by Charles Warden 8.2k

score 0 · Answer 2 · 2013-06-12

0

Entering edit mode

10.9 years ago

ewre ▴ 250

MANCOVA? it can be used to detect if there can be a interaction or relationship between group and mutation. In your case u have just 2 variables, a co-variant analysis may be enough. very sorry for my mistake, MANCOVA not manova.

ADD COMMENT • link 10.9 years ago by ewre ▴ 250

0

Entering edit mode

MANOVA is _Multivariate_ analysis of variance - it's for cases in which you have multiple response variables. I don't that's what the OP is looking for?

ADD REPLY • link 10.9 years ago by David W 4.9k

0

Entering edit mode

Thanks for the suggestion.

I had forgotten about MANOVA - I will have to take a look at it more closely, but I would initially agree with David W's response.

My main concern that I had after briefly reviewing the method was the interpretation of the results. For example, I know how to define an interaction term in a linear regression model (which would define the 4 groups defined by group x mutation), but I don't want to consider each interaction group equally. Instead, I have a specific hypothesis about how two pairs of interaction terms (mutant-Group1 vs. mutant-Group2, and WT-Group1 vs WT-Group2) compare to one another. I don't currently see how the MANOVA result will answer this specific hypothesis.

ADD REPLY • link 10.9 years ago by Charles Warden 8.2k

0

Entering edit mode

Also, I think MANOVA assumes some sort of paring between the dependent variables - there is no pairing between WT and mutant samples. In fact, there are more mutants than WTs (in my specific case), so the dependent arrays would be different lengths.