Dear All,
I am presented a single-cell RNAseq dataset that has two treatment conditions. Each condition had three biological replicates but due to a mistake the three replicates in each condition were merged during sequencing - so the sample origin of the cells is not available.
After identifying several cell-type clusters from this dataset, I am now looking for a method to determine the differentially expressed genes (DEGs) between the treatment conditions for each cluster. One of my goals is to identify cell-type clusters that were the most strongly affected by treatment.
Since the sample IDs are lost, the traditional pseudobulking and mixed model strategies do not seem to be possible. I have also tried the following:
Various DE tests (including GLM, Wilcoxon, etc.) for each cluster independently. Using a fixed FDR threshold for all clusters, the number of DEGs is badly affected by the number of cells in a cluster (verified by a series of downsampling tests).
The Augur classification test. Unfortunately clusters with high AUCs in the Augur test did not coincide with those with higher numbers of DEGs in Part 1 (some of them even had no DEGs).
Could anyone recommend either (a) tests or algorithms that robustly detect DEGs, independently of cluster size; (b) a pseudobulking, or pseudo-replication strategy for single-sample data; or (c) methods that do not rely on DEGs to show effect of a condition on the clusters?
Any input would be appreciated.
Hi, just wanted to ask if you've got any answers?