I am trying to build a contrast matrix, in order to run a fit linear model. It is a basic comparison between different histologic types of tumors - benign or BL; early stage; late stage. And the goal here is to investigate whether FFPE (formalin-fixed) material differs from fresh-frozen material in terms of methylation pattern (we're using the illumina's EPIC). To that end, we collected FFPE and fresh-frozen samples from the same patient.
The basic experiment looks something like this:
> clindata Subject Material_Source Tumor_stage ID2 1 P235 FFPE Benign_or_BL P235_FFPE 2 P432 FFPE Benign_or_BL P432_FFPE 3 P421 FFPE Early P421_FFPE 4 P93 FFPE Early P93_FFPE 5 P876 FFPE Early P876_FFPE 6 P543 FFPE Late P543_FFPE 7 P532 FFPE Late P532_FFPE 8 P152 FFPE Late P152_FFPE 9 P235 Fresh Benign_or_BL P235_Fresh 10 P432 Fresh Benign_or_BL P432_Fresh 15 P421 Fresh Early P421_Fresh 16 P93 Fresh Early P93_Fresh 17 P876 Fresh Early P876_Fresh 24 P543 Fresh Late P543_Fresh 25 P532 Fresh Late P532_Fresh 26 P152 Fresh Late P152_Fresh
clindata$Subject refers to patient ID; and the following 2 columns refers to the source of material and tumor stage, respectively.
clindata$ID2 is a merge between values in
So, now comes my question: How to build the contrast matrix for comparison between different tumor stages, but accounting for the patient and material source variables?
My idea is the following:
#preparing data: > TS <- factor(clindata$Material_Source) > SubMS <- factor(clindata$ID2) #designing the matrix: design <- model.matrix(~0+Tumor_stage+ID2, data=clindata) colnames(design) <- c(levels(TS), levels(SubMS)[-1])
I can run the
makeContrasts() functions after that, together with the array data. Now, of course the n for each group is rather small, but this is just an example (there will be more samples added to each group on the final experiment). But my question is:
Does that design make sense?
Would you suggest anything different (e.g. (A) treat all 3 classes separately, instead of merging the 2 co-variates as one co-variate; or (B) consider only the "Subject" group as a co-variate, since the pairwise comparison would already account for one sample being FFPE and the other fresh-frozen)?
Any help is greatly appreciated here. Thanks!