Hi,
After annotation all my major cell types, I subsetted each subset of interest further to tease out more nuanced annotations. So for example, my T cell subset had 11 subclusters. I then found patient wise DEGs (condition vs control), in each of these subsets. Patient wise since we were interested in looking at the samples itself, but each patient contributed only one sample:
subset_object <- readRDS("subset.RDS")
DefaultAssay(subset_object) <- "RNA"
Idents(subset_object) <- "seurat_clusters"
subset_object$subcluster <- Idents(subset_object)
subset_object <- JoinLayers(subset_object)
markers_list <- list()
# Loop over each subcluster
for (cluster in unique(subset_object$subcluster)) {
message(paste0("Processing cluster: ", cluster))
subcluster_data <- subset(subset_object, subset = subcluster == cluster)
Idents(subcluster_data) <- "category"
nv_cells <- WhichCells(subcluster_data, idents = "NV")
for (patient in setdiff(unique(subcluster_data$category), "NV")) {
message(paste0(" Comparing ", patient, " vs NV"))
patient_cells <- WhichCells(subcluster_data, idents = patient)
if (length(patient_cells) < 5 | length(nv_cells) < 5) {
message(paste0(" Skipping ", patient, " due to low cell count"))
next
}
markers <- FindMarkers(
subcluster_data,
ident.1 = patient_cells,
ident.2 = nv_cells,
logfc.threshold = 0.25,
min.pct = 0.25,
only.pos = FALSE
)
markers$gene <- rownames(markers)
markers$cluster <- cluster
key <- paste0("subset_cluster_", cluster, "_", patient, "_vs_NV")
markers_list[[key]] <- markers
}
}
# Save results to CSV
for (name in names(markers_list)) {
write.csv(markers_list[[name]], file = paste0("csvs/", name, ".csv"))
}
The code works perfectly okay, but my issue is that the results from these DEGs are not unique to that specific subcluster. So for almost all 12 subclusters in the T cell subset, Patient1 will have almost the same top few DEGs. I understand overclustering can add to this problem, but we are quite confident in our annotations, and regardless at least some clusters should have different DEGs then.
I am unsure how to tackle this problem because it seems like these redundant DEGs are not at all related to which subcluster the analysis is being done in. Would appreciate any insight thanks!