Question

Statistical technique for mutation to gene expression link

0

Entering edit mode

9.8 years ago

RBee • 0

I have following statistical data from a set of samples (healthy and diseased tissue - for approximately 100 individuals). For each sample:

gene expression in healthy and diseased tissues (as 1-healthy only, 2-diseased only, 3-healthy and diseased)
for a set of mutations (M1, M2, M3....) - for each mutation whether the sample has that mutation or not (as 1 or 0).

I want to analyze the link between the mutation and the expression of the gene:

for each mutation, is there a relation between that particular mutation and gene-expression in healthy/diseased tissue (positive or negative)? For instance, is the mutation leading to increased (or decreased) expression in the diseased tissue?
if a group of mutations together have an impact on gene-expression?

I am wondering what are the appropriate statistical tests for analyzing the two cases. I was considering Wilcoxon Test/Paired T-test for 1. Is that the right approach?

and for 2, I was considering using logistic regression. Would that work?

Any advice or pointers would be greatly appreciated.

Thanks in advance.

mutation gene-expression statistics • 2.3k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by RBee • 0

Ram · Answer 1 · 2014-07-06

0

Entering edit mode

9.8 years ago

Devon Ryan 104k

McNemar's test is telling you almost the exact opposite of what you want to know (namely, whether the marginal probabilities differ). You can just use logistic regression each of those question. You could also just use a Fisher's test for question 1.

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon. Appreciate your response. I actually thought about and re-coded the dataset. I edited the original post to reflect this, but perhaps you replied around the same time.

Essentially, the change is that I want to understand the 'affect of a mutation on gene expression in healthy vs. diseased tissue (tissue samples from the same individual)'. Hence, I changed the gene expression encoding to 1-healthy only, 2-diseased only, 3-healthy and diseased. Does logistic regression still make sense for this data-set?

Does Wilcoxon Test/Paired T-test make sense to look at link between a mutation and increased/decreased expression in healthy vs. diseased tissue?

How safe is it to assume that the population is Gaussian? Are there other tests I should look at to establish that?

Thanks.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by RBee • 0

0

Entering edit mode

Looks like I replied around the time you updated :)

Neither a wilcoxon nor a paired t-test would make sense for your question. Both of these tests are expecting a distribution of values as a function of group or a covariate, which isn't the case for your question. What you're asking is whether a multinomial outcome varies according to binomial predictor (or the other way around). Thus, it makes more sense to use either multinomial or logistic regression, depending on which way you want to frame the question. BTW, if 1 is healthy then it would make more sense for 2 to be healthy and diseased.

A Gaussian assumption won't hold for this sort of dataset.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Devon Ryan 104k