Hi I am trying to create an appropriate design matrix to determine gene that are differentially expressed between cancer and normal sample. Below table is the information file for my dataset.
Sample | Subtype | Cancer |
A | Normal | Normal |
B | Normal | Normal |
C | Normal | Normal |
D | stage 1 | Cancer |
E | stage 1 | Cancer |
F | stage 2 | Cancer |
G | stage 2 | Cancer |
H | stage 2 | Cancer |
I | NA | Biopsy |
At the moment I create a design matrix using these command:
f=factor(information$cancer)
design=model.matrix(~f)
fit=lmFit(exp_sample,design)
fit=eBayes(fit)
However I am not sure how to construct my contrast matrix for 2 levels (compare cancer and normal) only? Also do i need to discard those biopsy data or not? It would be great if you guys can give me some advice! Many thanks