Can I remove the control in differential expression analysis?
1
0
Entering edit mode
17 months ago
Morgan S. ▴ 80

Hi there,

Essentially, my experimental design is control vs treatment. Cells were sorted based on fluorescence, so there are 4 different "colors" of treated cells, i.e. red, green, green+red, and blue+green+red. I am interested in how the colors differ from one another. And, I have duplicates for all colors and the control.

> colData(dds_no)
DataFrame with 10 rows and 3 columns
sample             color          sizeFactor
<factor>  <factor>            <numeric>
CTRL_1     control     0.730176128359336
CTRL_2     control     1.12370593310441
GFP_1     green       1.62229333835717
GFP_3     green       0.733077520973604
GFPRFP_1   greenred    1.24575808750345
GFPRFP_2   greenred    1.27612350159403
RFP_2          red         1.57975191196927
RFP_3          red         0.518878115991023
TRIPLE_1       rgb         0.833793868046399
TRIPLE_3       rgb         1.01334467700869


I am wondering if it is appropriate to remove my control from downstream differential expression analysis when I want to only look at variation between treated cells? When I include the control, the PCA shows that the control samples are well separated from all treated cell types.

What I want (and have done) is removed control from the raw counts, built the deseq2 object with ~color as my design, produced results with LRT instead of Wald, and selected significant DEGs by the adj. pvalue. Then from this, I created a heatmap with my samples (still without control) to show the variation they have with certain significant genes and gene sets.

dds_no = DESeqDataSetFromMatrix(countData=countData_no,
colData=colData_no,
design=~color)
dds_LRT = DESeq(dds_no,test="LRT", reduced=~1)
res_LRT <- results(dds_LRT)
rld_LRT<- rlogTransformation(dds_LRT)
pathsLRT<-assay(rld_LRT)
df_pathLRT <- cbind(rownames(res_LRT), data.frame(res_LRT, row.names=NULL))
topTableLRT <- as.data.frame(df_pathLRT)
topMatrixLRT <- pathsLRT[which(rownames(pathsLRT) %in% sigGeneListLRT),]
topMatrixPATHSLRT <- gsva(data.matrix(topMatrixLRT),
stress_Cao_etal_2017,
method="gsva",
min.sz=1,
max.sz=Inf,
kcdf="Gaussian",
mx.diff=TRUE,
verbose=TRUE)
heat_pathsLRT <- t(scale(t(topMatrixPATHSLRT)))
pheatmap(heat_pathsLRT, annotation_col=dfdds, fontsize_row = 8, cluster_rows = FALSE, cluster_cols =
TRUE, main="Stress Paths", annotation_legend=FALSE)


Please let me know if this approach is acceptable or if it truly requires I include the control.

Thanks, Morgan

expression differential deseq2 analysis rnaseq • 584 views
3
Entering edit mode
17 months ago

If you are interested in differences between the colours, then yes, this design is ok. Indeed, if you wish to ask "Is the change between green and control different than the change between red and control" you are implicitly ingoring the control in any disign as (control - green) - (control - red) == red - green. However, you may hav difficulties in the down stream interprestation of the results. Let say that your effect size for red - green` is 2, than means that expression is 4 times as high in red as it is in green. But this could mean that red upregaultes gene expression 4 times more than green, or that it down regulates gene expression 4 times less than green, or that red up regulates expression two fold and green down regulates expression 2 fold. The control would allow you to distinguish these.

My advice would be to calculate significant genes as you have done, but calculate all your log fold changes compared to control in order interpret the results.

0
Entering edit mode