Correlating gene expression with qualitative variables
1
0
Entering edit mode
4.3 years ago
fr ▴ 170

I have a gene expression dataset that I want to investigate. Particularly, I would like to understand whether there is any correlation between each gene's expression and some quantitative or qualtitative data (say, correlation between gene 'XPTO' , body mass index, and race).

One possible way to test this would be through logistic regression, but is this a good approach or are there caveats that I should know about using such a statistic?

My question is the following: which methods would you advise to measure such correlations, and why?

(This question was crossposted on Stackexchange)

RNA-Seq R correlation • 1.6k views
4
Entering edit mode
4.3 years ago

Based on this paper ( https://www.nature.com/articles/srep24375 ) you should use robust linear regression (rlm() function in R - MASS package) with log2 TMM (edgeR).

0
Entering edit mode

Thank you so much, this is very interesting and useful! But if I understood correctly, the authors only consider quantitative variables for the association (e.g. BMI), not qualitative correct? How would you proceed if you had instead qualitative data?

1
Entering edit mode

Then you can choose either DESeq2, edgeR or limma that allows multi factor designs. For mixing quantitative and qualitative data, I'm not sure if it's feasible easily. You should maybe add the qualitative data into the model.

Example

Sample  Age Group
A   20  Treatment_1
B   25  Treatment_1
C   24  Treatment_1
D   45  Treatment_2
E   36  Treatment_2
F   80  Treatment_2
G   52  Treatment_3
H   23  Treatment_3
I   47  Treatment_3

model : GeneExpression ~ Age + Group


You apply your model to all genes separately and extract the p-value + correlation metrics. Never tried but should work