I am going to profile a clinical RNA-seq study with 51 samples for differentially expressed genes. As described in limma-voom vignettes,I have created a DGEList object:
y1<-DGEList(counts=assays(summarizedExperiment1)$counts, genes=annotations1) y2<-calcNormFactors(y1)
Then to explore the clustering of the samples, I have created PCA plots
plotMDS(y2, labels=resp, top=50, col=ifelse(resp=="N", "red", "blue"), gene.selection="common", prior.count=5)
There is a clear separation of samples over PC1 but I don't know the attribute that correlates with it. Should I create an attribute, as batch_1 for the 2 groups on either side of PC1 and create a model.matrix as:
or should I just model the comparison I am interested in:
Any suggestion would be appreciated.