Question

Account for Clinical Variables in RNA-Seq Analysis

0

Entering edit mode

8 months ago

fabian ▴ 10

Dear Biostars community,

I am analyzing RNA-Seq data from peripheral blood samples of patients who suffered a myocardial infarction. Some of these patients subsequently developed shock, and I want to compare their DGE profiles at the time of hospital admission.

This is a clinically oriented analysis, and I also have rich metadata, including:

Patient demographics (age, sex, diabetes status), Time until hospital arrival, Lactate levels (a key biomarker for shock prediction), Troponin levels, and other variables.

Given that many of these parameters influence the likelihood of developing shock, I believe they must be accounted for in my analysis. Lactate, in particular, is a critical variable as it is the strongest biomarker for predicting shock in this cohort.

I have two main questions:

Should I include all relevant variables in the design matrix, such as:

design <- model.matrix( ~ 0 + shock + age + Male + diabetes + time.until.arrival + lactate.at.admission + troponin.at.admission, data = targets)

How should I handle batch effects in this context? For example, I created an MDS plot, which showed that most batch effects were driven by sex. After batch correction, the sex effect disappeared. However, I suspect variables like age and lactate levels are also biologically relevant (I also attached a MDS plot after batch correcting for all variables)

MDS Plot 2 after batch correction for sex MDS Plot 3 after batch correction age, sex, lactate, etc.

edgeR Batch effect • 436 views

ADD COMMENT • link 8 months ago by fabian ▴ 10