Essentially, my experimental design is control vs treatment. Cells were sorted based on fluorescence, so there are 4 different "colors" of treated cells, i.e. red, green, green+red, and blue+green+red. I am interested in how the colors differ from one another. And, I have duplicates for all colors and the control.
> colData(dds_no) DataFrame with 10 rows and 3 columns sample color sizeFactor <factor> <factor> <numeric> CTRL_1 control 0.730176128359336 CTRL_2 control 1.12370593310441 GFP_1 green 1.62229333835717 GFP_3 green 0.733077520973604 GFPRFP_1 greenred 1.24575808750345 GFPRFP_2 greenred 1.27612350159403 RFP_2 red 1.57975191196927 RFP_3 red 0.518878115991023 TRIPLE_1 rgb 0.833793868046399 TRIPLE_3 rgb 1.01334467700869
I am wondering if it is appropriate to remove my control from downstream differential expression analysis when I want to only look at variation between treated cells? When I include the control, the PCA shows that the control samples are well separated from all treated cell types.
What I want (and have done) is removed control from the raw counts, built the deseq2 object with ~color as my design, produced results with LRT instead of Wald, and selected significant DEGs by the adj. pvalue. Then from this, I created a heatmap with my samples (still without control) to show the variation they have with certain significant genes and gene sets.
dds_no = DESeqDataSetFromMatrix(countData=countData_no, colData=colData_no, design=~color) dds_LRT = DESeq(dds_no,test="LRT", reduced=~1) res_LRT <- results(dds_LRT) rld_LRT<- rlogTransformation(dds_LRT) pathsLRT<-assay(rld_LRT) df_pathLRT <- cbind(rownames(res_LRT), data.frame(res_LRT, row.names=NULL)) topTableLRT <- as.data.frame(df_pathLRT) sigGeneListLRT <- subset(topTableLRT, padj<=0.05)[,1] topMatrixLRT <- pathsLRT[which(rownames(pathsLRT) %in% sigGeneListLRT),] topMatrixPATHSLRT <- gsva(data.matrix(topMatrixLRT), stress_Cao_etal_2017, method="gsva", min.sz=1, max.sz=Inf, kcdf="Gaussian", mx.diff=TRUE, verbose=TRUE) heat_pathsLRT <- t(scale(t(topMatrixPATHSLRT))) pheatmap(heat_pathsLRT, annotation_col=dfdds, fontsize_row = 8, cluster_rows = FALSE, cluster_cols = TRUE, main="Stress Paths", annotation_legend=FALSE)`
Please let me know if this approach is acceptable or if it truly requires I include the control.
Ok, that makes a lot of sense. Thanks for helpful advice!