Looking for a Viable Design Matrix for Differential Gene Expression Using Paired Sample
1
0
Entering edit mode
19 months ago
bdighera • 0

Firstly, let me start by saying that I am relatively new to differential expression analysis so please bear with me. I have read over the Limma user guide countless times, and have looked at previous posts regarding similar questions, however I am not convinced that my design matrix truly captures the intention of my study.

My exploratory study consists of an Affymetrix U133a microarray dataset consisting of eight subjects pre/post sleep deprivation (16 total data points). Two psychological evaluations were administered during these time points (SSS and PVT) which I want to use as continuous response variables to represent patient sleep deprivation. My end goal is to determine which differentially expressed genes respond to sleep deprivation in my paired sample.

Does anyone have any thoughts about whether my design matrix will yield genes only found in responders (Those with higher PVT/SSS scores)? What would be my primary coefficient of interest? Additionally, is there any way that I could remove the effects of gender on my design matrix?

This my data frame which I am using to construct the design matrix:

patient <- factor(rep(c(1,2,3,4,5,6,7,8), each=2)) #patient ID
condition <- factor(rep(c('Post', 'Pre'), 8)) #Pre or Post Treatment
gender <- factor(c(rep('F', 8), rep('M', 8))) #gender
PVT <- c(339.67,254.56,423.33,...) #Response Variable 1
SSS <- c(6,2,3,1,3,2,5,2,5,1,3,2,2,1,5,3) #Response Variable 2
data.frame(patient, condition, gender, PVT, SSS)

patient condition gender    PVT SSS
1        1      Post     F 339.67   6
2        1      Pre      F 254.56   2
3        2      Post     F 423.33   3
4        2      Pre      F 316.09   1
5        3      Post     F 640.13   3
6        3      Pre      F 358.82   2
7        4      Post     F 321.15   5
8        4      Pre      F 491.67   2
9        5      Post     M 338.99   5
10       5      Pre      M 288.09   1
11       6      Post     M 261.96   3
12       6      Pre      M 246.69   2
13       7      Post     M 276.48   2
14       7      Pre      M 250.11   1
15       8      Post     M 267.14   5
16       8      Pre      M 249.67   3


This is my proposed design matrix:

design <- model.matrix(~patient + condition*PVT+SSS)


Any input would be greatly appreciated. Thank you in advance.

R limma bioconductor • 788 views
1
Entering edit mode
19 months ago

Given the low sample numbers, I would aim to keep this as simple as possible, and I think that there are multiple ways to do this. I would start with, for example:

 - ~ patient + gender + condition * PVT
- ~ patient + gender + condition * SSS


This assumes that PVT (Psychomotor Vigilance Task) and SSS (Stanford Sleepiness Scale) are independent evaluations (response variables) and can be tested independently - correct? As such, they are not quite covariates and are actually the variables under study?

This formula (above) will also account for the patient pairing, and also gender.

## -------------------

You could also stratify by gender and conduct separate analyses for the male and female groups:

## Female

 - ~ patient + condition * PVT
- ~ patient + condition * SSS


## Male

 - ~ patient + condition * PVT
- ~ patient + condition * SSS


Also take a look through the Bioconductor support forum threads for other ideas.

Kevin

0
Entering edit mode

Kevin, thank you so much for your response. It is exactly what I need to move forward in my analysis. You are correct, both PVT and SSS are independent evaluations and therefore can be tested independently. Do you think that stratifying by gender would significantly compromise the strength of the analysis with respect to the designs which include both genders due to the reduced sample size?

0
Entering edit mode

I am not sure that it will be problematic to segregate based on gender. The total sample number is already not too high. Is sex / gender a known confounding factor in these types of studies?