I did differentially expression (DE) analysis on TCGA datasets and identified DE genes by edgeR (DE genes between cancer and normal samples). During my paper review, one of reviewers raised the question which is how to deal with batch effect.
I checked the edgeR manual. It could deal with batch effect by adding batch into designed matrix, just like: “design <- model.matrix(~Batch+Treatment)” in section 3.4.3. However, the Batch should be specified by user, just like: “Batch <- factor(c(1,3,4,1,3,4))”. To achieve it, I must know which sample is belonged to which batch and it is unknown to me in TCGA datasets. Besides it also provide a function called “plotMDS” to check batch effects in the datasets. But I didn`t know how to interpret this plot properly.
Do you know how to deal with batch effect in TCGA RNA-seq datasets? Can you teach me how to identify batch effect in MDS plot?
Thanks in advance.