I was asked to perform exploratory analysis for scRNA-seq. I am new to this kind of analysis and I’m not sure how to decide on a couple of things. As I said in the title, I have only one sample per condition.
I did the PCA plot to see whether I should use merge or integrate, based on that I decided on merge. I created volcano plots to determine what kind of cut-off I should use in QC. I also made the Elbow plot to choose the dims.
I am now looking at the UMAP (I used SCT normalization) and trying to choose the resolution. Do you have any advice on what I should pay special attention to?
I used SCT for normalization and then run FindAllMarkers + FindMarkers, as well as NormalizeData and bulkDE. I’m looking mainly at the log2FC to check if the trends are similar.
Has anyone ever done such an analysis? It’s only exploratory and meant to observe trends, but I still want to do it as well as possible.
I’d appreciate any advice or thoughts on this, I think it will also be a valuable lesson for the future when we decide to sequence more samples.
Do you have any advice on what I should pay special attention to?
No, this is extremely dataset-dependent. Just follow the vignette, or consider using https://bioconductor.org/books/release/OSCA/ and then see what you get. There must be some sort of hypothesis why you even did the experiment, no? What do you mean by "trends"?
First off, welcome to scRNA-seq analysis—it's a fascinating (and sometimes overwhelming) world, especially when you're starting out with limited samples. Since this is purely exploratory and focused on trends, you're already on the right track by prioritizing visualization and sanity checks like PCA for batch effects, volcano plots for QC thresholds, and the elbow plot for dimensionality. Kudos for that methodical approach; it will serve you well as you scale up to more replicates.
On choosing the resolution for clustering in Seurat (assuming that's your workflow, given SCT and UMAP): this is more art than science, but here's what I always pay attention to:
Cluster granularity vs. biology: Run FindClusters at a range of resolutions (e.g., 0.1, 0.3, 0.5, 0.8, 1.2) and inspect the UMAPs side-by-side. Aim for clusters that align with expected cell types or states—too low (e.g., <0.2) might lump everything into 2-3 broad groups; too high (>1.0) can over-split into noisy doublets or artifacts. For exploratory work, 0.4-0.6 often strikes a good balance for many datasets.
Silhouette scores or stability: After clustering, use clustree (if installed) to visualize how clusters split/merge across resolutions. Look for stable clusters that don't fragment excessively. Also, check the average silhouette width (via cluster::silhouette())—higher values (>0.5 ideally) indicate tighter, more separated groups.
Marker gene expression: Post-clustering, plot top markers from FindAllMarkers on the UMAP. Do they make biological sense? E.g., are immune cells co-clustering? With one sample per condition, watch for condition-specific shifts in cluster proportions rather than de novo markers.
A few extra tips tailored to your single-sample setup:
Pseudobulk for DE: Great that you're comparing FindMarkers (per-cell) to NormalizeData + bulk DE—log2FC trends should roughly align, but pseudobulk (aggregate counts by cluster, then DESeq2/Wilcox) is more robust here since it mimics bulk with technical replicates (cells). It reduces dropout noise.
QC cutoffs from volcano: Solid choice, but also gate on mitochondrial % (<10-20%), ribosomal genes, and doublet scores (e.g., via DoubletFinder) to avoid biasing clusters.
Limitations to note: With n=1 per condition, any "differential" trends are hypothesis-generating only—power is low, and batch effects could masquerade as biology. When you get more samples, integration (Harmony or fastMNN) will shine.
If you share a snippet of your Seurat object summary or UMAP screenshots, I (or the community) could offer more targeted feedback. Keep at it—this hands-on experience will make multi-sample runs a breeze.
No, this is extremely dataset-dependent. Just follow the vignette, or consider using https://bioconductor.org/books/release/OSCA/ and then see what you get. There must be some sort of hypothesis why you even did the experiment, no? What do you mean by "trends"?