Question: multivariate analysis of RNA-seq
0
gravatar for moxu
2.1 years ago by
moxu430
moxu430 wrote:

When you have only one variable with two categories (e.g. disease vs control) to compare for RNA-seq, you assume the expression level follows negative binomial distribution and you can use DESeq, edgeR, etc. software to do differential gene expression analysis. How if your variable is not binary but continues such as treated by a compound with different concentrations (e.g. 0.1nM, 0.2 nM, 0.5nM). Or even more complicated, besides different compound concentrations, you have time points. If you have more than just one binary variable to consider, what do you do for differential gene expression analysis?

What I have in mind is to use:

-log(expression) = case/control + [compound concentration] + time_of_treatment

And check for the p-value for the thetas (slopes) of each variable for significance.

Thank you!

rna-seq next-gen R • 1.2k views
ADD COMMENTlink modified 23 months ago by Biostar ♦♦ 20 • written 2.1 years ago by moxu430
1

You can use edgeR, DESeq, etc. for 2 way (or more way) analysis (factorial analysis). You'll have to define your linear model correctly. They have some examples in the user's guides.

ADD REPLYlink written 2.1 years ago by Benn6.6k

That's great! Glad to know edgeR can do this. However, I have some difficulty understanding "group", "coef", "contrast", etc. in edgeR. Given the following data table:

GENE EXPRESSION DISEASE A B DISEASExA DISEASExB
1 A1BG  0.4785665       1   0   0           0           0
2 A1BG -2.0000000       1   0   0           0           0
...
610683 ZZZ3   1.903144       0   0   1           0           0
610684 ZZZ3   1.959089       0   0   1           0           0

A: concentration of compound A

B: concentration of compound B

DISEASE: 0/1 whether it's a disease or normal sample

Forget about the last two columns (Disease x A, Disease x B, these two are simply the multiplication of the corresponding vars).

If what I care is to do the following "glm":

EXPRESSION ~ intercept + DISEASE + A + B

Then,

1) How should I define "group" in edgeR?

2) coef = 3?

3) Should I use contrast of c(0, 1, 1, 1)?

If we can assume EXPRESSION or log(EXPRESSION) is normally distributed, then in R we can simply do

glm(EXPRESSION ~ DISEASE + A + B)

Don't know why edgeR is so (unnecessarily?) complicated.

Thanks much!

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by moxu430

This question is lifted up to a post at [edgeR Usage] How does edgeR handle multivariate gene expression analysis

Thanks.

ADD REPLYlink written 2.1 years ago by moxu430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1660 users visited in the last hour