Hello everyone, I'm trying to find the best way to do my analysis - specifically, my contrasts. I have samples nested and grouped, thus I'm afraid I have made some errors here.
My experiment is as follows: I have RNASeq data consisting of the following features for each patient: ID, Age, Sex, and whether he/she has any number of active phenotypes (marked 1..N). From each patient, two samples were taken, lesional and non-lesional (L & N respectively).
Here is a sample table; For simplicity, let's assume only two phenotypes are measured.
PatientID Age Sex SampleType Phenotype1 Phenotype2 ,..., PhenotypeN
1 10 M L FALSE TRUE
1 10 M N FALSE TRUE
2 20 F L TRUE TRUE
2 20 F N TRUE TRUE
3 30 M L TRUE FALSE
3 30 M N TRUE FALSE
...
My research questions can be split into these four "prototypes":
- What are the DEGs between Lesional and Non-Lesional samples, for those with Phenotype1=<FALSE> and phenotype2=<TRUE>, or any other combination of the two (±adjusted for age & sex)
- What is the "difference of differences" between phenotype1 (TRUE vs FALSE) and SampleType (L vs NL)?
- What are the DEGs between Lesional and Non-Lesional samples, for those with Phenotype1=<FALSE> regardless of phenotype 2, and vice-versa (±adjusted for age & sex)
- What are the DEGs between Phenotype1 = <TRUE> and Phenotype1=<FALSE>, within the lesional group and within the non-lesional group?
I have read some posts regarding these questions, but I'm still uncertain:
mydesign<-model.matrix(~Age + Sex + SampleType*Phenotype1 + SampleType*Phenotype2, data=sample_data)
colnames(mydesign)<-make.names(colnames(mydesign))
#colnames(mydesign): "X.Intercept." "Age" "SexM" "SampleTypeL"
# "Phenotype1TRUE" "Phenotype2TRUE" "SampleTypeL.Phenotype1TRUE" "SampleTypeL.Phenotype2TRUE"
makeContrasts(LvsN_P1FALSE_P2TRUE = SampleTypeL + SampleType.Pheonotype2TRUE
LvsN_vs_P1TRUEvsFALSE = SampleTypeL.Pheonotype2TRUE
LvsN_P1TRUE_P2dontcare = SampleTypeL + (SampleTypeL.Phenotype2TRUE/2) + SampleTypeL.Phenotype1TRUE
P1TRUEvsFALSE_SampleTypeL = Phenotype1TRUE + SampleTypeL.Phenotype1TRUE
levels=mydesign)
Edited: I'll explain the rationale behind "LvsN_P1TRUE_P2dontcare":
LvsN_P1TRUE_P2dontcare =
(Mean of all samples where [SampleType=L] and [Phenotype1 = TRUE]) - (Mean of all samples where [SampleType=N] and [Phenotype1 = TRUE]) =
((Samples of [SampleType=L], [Phenotype1 = TRUE], [Phenotype1 = FALSE]) + (Samples of [SampleType=L], [Phenotype1 = TRUE], [Phenotype1 = FALSE])) / 2 -
((Samples of [SampleType=N], [Phenotype1 = TRUE], [Phenotype1 = FALSE]) + (Samples of [SampleType=N], [Phenotype1 = TRUE], [Phenotype1 = FALSE])) / 2 =
(SampleTypeL + SampleTypeL + SampleTypeL.Phenotype1TRUE + SampleTypeL.Phenotype1TRUE + SampleTypeL.Phenotype2TRUE)/2 =
(SampleTypeL + (SampleTypeL.Phenotype2TRUE/2) + SampleTypeL.Phenotype1TRUE)
Am I correct here? is that the way to create the contrasts?
Thank you! -Jonathan