Question: Statistical technique for mutation to gene expression link
gravatar for RBee
5.2 years ago by
RBee0 wrote:

I have following statistical data from a set of samples (healthy and diseased tissue - for approximately 100 individuals). For each sample:

  • gene expression in healthy and diseased tissues (as 1-healthy only, 2-diseased only, 3-healthy and diseased) 
  • for a set of mutations (M1, M2, M3....) - for each mutation whether the sample has that mutation or not (as 1 or 0).

I want to analyze the link between the mutation and the expression of the gene:

  1. for each mutation, is there a relation between that particular mutation and gene-expression in healthy/diseased tissue (positive or negative)? For instance, is the mutation leading to increased (or decreased) expression in the diseased tissue?
  2. if a group of mutations together have an impact on gene-expression?

I am wondering what are the appropriate statistical tests for analyzing the two cases. I was considering Wilcoxon Test/Paired T-test for 1. Is that the right approach? 

and for 2, I was considering using logistic regression. Would that work? 

Any advice or pointers would be greatly appreciated.

Thanks in advance.

ADD COMMENTlink modified 5.2 years ago by Devon Ryan91k • written 5.2 years ago by RBee0
gravatar for Devon Ryan
5.2 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

McNemar's test is telling you almost the exact opposite of what you want to know (namely, whether the marginal probabilities differ). You can just use logistic regression each of those question. You could also just use a Fisher's test for question 1.

ADD COMMENTlink written 5.2 years ago by Devon Ryan91k

Thanks Devon. Appreciate your response. I actually thought about and re-coded the dataset. I edited the original post to reflect this, but perhaps you replied around the same time. 

Essentially, the change is that I want to understand the 'affect of a mutation on gene expression in healthy vs. diseased tissue (tissue samples from the same individual)'. Hence, I changed the gene expression encoding to 1-healthy only, 2-diseased only, 3-healthy and diseased. Does logistic regression still make sense for this data-set?

Does Wilcoxon Test/Paired T-test make sense to look at link between a mutation and increased/decreased expression in healthy vs. diseased tissue?

How safe is it to assume that the population is Gaussian? Are there other tests I should look at to establish that?


ADD REPLYlink written 5.2 years ago by RBee0

Looks like I replied around the time you updated :)

Neither a wilcoxon nor a paired t-test would make sense for your question. Both of these tests are expecting a distribution of values as a function of group or a covariate, which isn't the case for your question. What you're asking is whether a multinomial outcome varies according to binomial predictor (or the other way around). Thus, it makes more sense to use either multinomial or logistic regression, depending on which way you want to frame the question. BTW, if 1 is healthy then it would make more sense for 2 to be healthy and diseased.

A Gaussian assumption won't hold for this sort of dataset.

ADD REPLYlink written 5.2 years ago by Devon Ryan91k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2323 users visited in the last hour