Hi all, I have used the function AddModuleScore_UCell
from the 'UCell' package to calculate the scores of my Seurat object for different gene sets. Now I have for each of the sets in my metadata a column with the scores. (I suppose this would be similar also, if using the Seurat's function AddModuleScore
).
Are the scores being calculated independently of each other. I mean, does it make a difference, if I use 20 gene-sets to calculate scores, or just 5? Would the scores be different, if I use just five sets?
I would like to know if it is considered good practice to used the calculated scores to assign annotations to the cells? for example by taking the highest score per cell and in add it as its annotation in a new column in the object's metadata.
How can I take for each row the highest value of these columns and assign this as the cell annotation?
this how looks like at the meoment:
head(C7_G7_sctransform_ucell[[]])
orig.ident nCount_RNA nFeature_RNA percent.mt nCount_SCT
C7_AAACCCAAGGGTGGGA-1 C7 7305 2590 4.9555099 12430
C7_AAACCCAAGTAGCATA-1 C7 31421 5135 1.5053627 13648
C7_AAACCCACAAGAGGTC-1 C7 11609 2294 0.2239642 12649
C7_AAACCCACAGTCTCTC-1 C7 5980 1460 0.5852843 11949
C7_AAACCCACATGAATCC-1 C7 10781 2813 0.1576848 12446
C7_AAACCCAGTAGAGGAA-1 C7 26241 4248 2.0235509 13717
nFeature_SCT SCT_snn_res.0 SCT_snn_res.0.1 SCT_snn_res.0.2
C7_AAACCCAAGGGTGGGA-1 2645 0 0 3
C7_AAACCCAAGTAGCATA-1 3728 0 5 8
C7_AAACCCACAAGAGGTC-1 2290 0 6 9
C7_AAACCCACAGTCTCTC-1 1826 1 7 10
C7_AAACCCACATGAATCC-1 2813 0 2 5
C7_AAACCCAGTAGAGGAA-1 3531 0 1 2
SCT_snn_res.0.3 SCT_snn_res.0.4 SCT_snn_res.0.5 SCT_snn_res.0.6
C7_AAACCCAAGGGTGGGA-1 3 3 3 4
C7_AAACCCAAGTAGCATA-1 8 9 10 12
C7_AAACCCACAAGAGGTC-1 9 10 12 13
C7_AAACCCACAGTCTCTC-1 10 11 13 15
C7_AAACCCACATGAATCC-1 6 7 8 8
C7_AAACCCAGTAGAGGAA-1 1 5 7 6
SCT_snn_res.0.7 SCT_snn_res.0.8 SCT_snn_res.0.9 seurat_clusters
C7_AAACCCAAGGGTGGGA-1 4 4 5 5
C7_AAACCCAAGTAGCATA-1 13 15 15 15
C7_AAACCCACAAGAGGTC-1 14 16 16 16
C7_AAACCCACAGTCTCTC-1 16 18 18 18
C7_AAACCCACATGAATCC-1 8 9 9 9
C7_AAACCCAGTAGAGGAA-1 6 7 7 7
T_NK_genes_ucell Bcell_genes_ucell Neutrophil_genes_ucell
C7_AAACCCAAGGGTGGGA-1 0.006393929 0 0.12896503
C7_AAACCCAAGTAGCATA-1 0.078317016 0 0.07633345
C7_AAACCCACAAGAGGTC-1 0.002827889 0 0.00000000
C7_AAACCCACAGTCTCTC-1 0.050577542 0 0.12983045
C7_AAACCCACATGAATCC-1 0.029087004 0 0.04888732
C7_AAACCCAGTAGAGGAA-1 0.015069232 0 0.12741081
Macrophage_genes_ucell DC1_genes_ucell DC2_genes_ucell
C7_AAACCCAAGGGTGGGA-1 0.15504068 0.04992586 0.21424227
C7_AAACCCAAGTAGCATA-1 0.03789460 0.04074476 0.13055956
C7_AAACCCACAAGAGGTC-1 0.03843868 0.01946457 0.05147748
C7_AAACCCACAGTCTCTC-1 0.11904681 0.04764305 0.16518249
C7_AAACCCACATGAATCC-1 0.02623341 0.07395279 0.14404241
C7_AAACCCAGTAGAGGAA-1 0.05140013 0.04725391 0.19036151
mregDC_genes_ucell Modc_ucell Ccr7+_Activated_DC_bulkRNAseq_ucell
C7_AAACCCAAGGGTGGGA-1 0.10208759 0.16014112 0.17001401
C7_AAACCCAAGTAGCATA-1 0.09409623 0.06489818 0.28722689
C7_AAACCCACAAGAGGTC-1 0.04511789 0.03876742 0.01963585
C7_AAACCCACAGTCTCTC-1 0.11730289 0.11931761 0.12873950
C7_AAACCCACATGAATCC-1 0.35031817 0.08091461 0.52435574
C7_AAACCCAGTAGAGGAA-1 0.11448165 0.10368703 0.41158263
Effector_Cd8Tcell_bulkRNAseq_ucell Ccr7+_Activated_DC_scRNAseq_ucell
C7_AAACCCAAGGGTGGGA-1 0.06456557 0.25998593
C7_AAACCCAAGTAGCATA-1 0.17755845 0.27021971
C7_AAACCCACAAGAGGTC-1 0.03033249 0.09164028
C7_AAACCCACAGTCTCTC-1 0.08172574 0.14018677
C7_AAACCCACATGAATCC-1 0.07802453 0.46288656
C7_AAACCCAGTAGAGGAA-1 0.08499078 0.34697304
Effector_Cd8_Tcell_scRNAseq_ucell Ccr7_Downstream_Genes_ucell
C7_AAACCCAAGGGTGGGA-1 0.10367573 0.12475424
C7_AAACCCAAGTAGCATA-1 0.21309164 0.22006256
C7_AAACCCACAAGAGGTC-1 0.04085264 0.08961126
C7_AAACCCACAGTCTCTC-1 0.11007049 0.17004021
C7_AAACCCACATGAATCC-1 0.05501846 0.26731457
C7_AAACCCAGTAGAGGAA-1 0.06435045 0.17500000
pDC_genes_ucell RBC_genes_ucell Mast_cell_genes_ucell
C7_AAACCCAAGGGTGGGA-1 0.10218172 0 0
C7_AAACCCAAGTAGCATA-1 0.01264449 0 0
C7_AAACCCACAAGAGGTC-1 0.05391171 0 0
C7_AAACCCACAGTCTCTC-1 0.06992938 0 0
C7_AAACCCACATGAATCC-1 0.05345800 0 0
C7_AAACCCAGTAGAGGAA-1 0.06280822 0 0
Plasmocytoid_DC_ucell
C7_AAACCCAAGGGTGGGA-1 0.2123539
C7_AAACCCAAGTAGCATA-1 0.1831386
C7_AAACCCACAAGAGGTC-1 0.0000000
C7_AAACCCACAGTCTCTC-1 0.2313022
C7_AAACCCACATGAATCC-1 0.3580968
C7_AAACCCAGTAGAGGAA-1 0.2657763
and I would like to add a new columns with the column names of the highest *_ucell value per row. Maybe in addition setting a certain threshold, below that the cell will be considered as unclassified.
What I have at the moment is as such:
# Define a threshold for classification
threshold <- 0.1
# Select only the columns that contain UCell scores
ucell_columns <- grep("_ucell$", colnames(C7_G7_sctransform_ucell[[]]), value = TRUE)
# Find the column name with the highest score per row and apply the threshold
C7_G7_sctransform_ucell$Top_Annotation <- apply(C7_G7_sctransform_ucell[[ucell_columns]], 1, function(x) {
max_value <- max(x)
if (max_value < threshold) {
return("Uncertain")
} else {
return(ucell_columns[which.max(x)])
}
})
But I'm not sure, if this is the best way to do it. More important, I am not sure if this is at all a good way to annotate my clusters and would appreciate your input about that.
thanks
From the same authors of
Ucell
, have a look atscGate
.Yes i have found it as well. thanks