I have single cell RNA seq data from 2 samples. One is control and the other one is treated. I am trying to analyze the data with Seurat. I did the QC analysis, normalized each sample and then I did the integration. I got the clusters and assigned cell types. Now I would like to do the DE analysis between the control and treated CD4 cells. Can I do DE analysis if I only have one sample per condition? Does it make sense statistically to do it? I am planning to do it anyway as exploratory data analysis but I was wondering how much can I trust the p values that come out of that test.
This will tell you the confidence you have between the two samples. This will not tell you anything about the difference between the two populations (conditions) the samples are drawn from. There is not enough information here to understand the population variance of the cluster means.
Specifically you need enough information to infer the parameters of the hierarchical model:
so this means minimally > 1 cell per sample, and > 1 sample per condition.
(+1) You say:
This is correct if you are interested in the differences between samples. However, if one is interested only in the difference between conditions it's ok, in theory, to have 1 cell per sample and many samples per condition. Effectively differential expression on bulk RNAseq works in this way. Do I get it right?
True, as far as it goes. Practically speaking you'll always have some replication at the cell level.