Question

Assigning cell annotation based on calculated scores.

0

Entering edit mode

4 months ago

Assa Yeroslaviz ★ 1.9k

Hi all, I have used the function AddModuleScore_UCell from the 'UCell' package to calculate the scores of my Seurat object for different gene sets. Now I have for each of the sets in my metadata a column with the scores. (I suppose this would be similar also, if using the Seurat's function AddModuleScore).

Are the scores being calculated independently of each other. I mean, does it make a difference, if I use 20 gene-sets to calculate scores, or just 5? Would the scores be different, if I use just five sets?

I would like to know if it is considered good practice to used the calculated scores to assign annotations to the cells? for example by taking the highest score per cell and in add it as its annotation in a new column in the object's metadata.

How can I take for each row the highest value of these columns and assign this as the cell annotation?

this how looks like at the meoment:

head(C7_G7_sctransform_ucell[[]])
                      orig.ident nCount_RNA nFeature_RNA percent.mt nCount_SCT
C7_AAACCCAAGGGTGGGA-1         C7       7305         2590  4.9555099      12430
C7_AAACCCAAGTAGCATA-1         C7      31421         5135  1.5053627      13648
C7_AAACCCACAAGAGGTC-1         C7      11609         2294  0.2239642      12649
C7_AAACCCACAGTCTCTC-1         C7       5980         1460  0.5852843      11949
C7_AAACCCACATGAATCC-1         C7      10781         2813  0.1576848      12446
C7_AAACCCAGTAGAGGAA-1         C7      26241         4248  2.0235509      13717
                      nFeature_SCT SCT_snn_res.0 SCT_snn_res.0.1 SCT_snn_res.0.2
C7_AAACCCAAGGGTGGGA-1         2645             0               0               3
C7_AAACCCAAGTAGCATA-1         3728             0               5               8
C7_AAACCCACAAGAGGTC-1         2290             0               6               9
C7_AAACCCACAGTCTCTC-1         1826             1               7              10
C7_AAACCCACATGAATCC-1         2813             0               2               5
C7_AAACCCAGTAGAGGAA-1         3531             0               1               2
                      SCT_snn_res.0.3 SCT_snn_res.0.4 SCT_snn_res.0.5 SCT_snn_res.0.6
C7_AAACCCAAGGGTGGGA-1               3               3               3               4
C7_AAACCCAAGTAGCATA-1               8               9              10              12
C7_AAACCCACAAGAGGTC-1               9              10              12              13
C7_AAACCCACAGTCTCTC-1              10              11              13              15
C7_AAACCCACATGAATCC-1               6               7               8               8
C7_AAACCCAGTAGAGGAA-1               1               5               7               6
                      SCT_snn_res.0.7 SCT_snn_res.0.8 SCT_snn_res.0.9 seurat_clusters
C7_AAACCCAAGGGTGGGA-1               4               4               5               5
C7_AAACCCAAGTAGCATA-1              13              15              15              15
C7_AAACCCACAAGAGGTC-1              14              16              16              16
C7_AAACCCACAGTCTCTC-1              16              18              18              18
C7_AAACCCACATGAATCC-1               8               9               9               9
C7_AAACCCAGTAGAGGAA-1               6               7               7               7
                      T_NK_genes_ucell Bcell_genes_ucell Neutrophil_genes_ucell
C7_AAACCCAAGGGTGGGA-1      0.006393929                 0             0.12896503
C7_AAACCCAAGTAGCATA-1      0.078317016                 0             0.07633345
C7_AAACCCACAAGAGGTC-1      0.002827889                 0             0.00000000
C7_AAACCCACAGTCTCTC-1      0.050577542                 0             0.12983045
C7_AAACCCACATGAATCC-1      0.029087004                 0             0.04888732
C7_AAACCCAGTAGAGGAA-1      0.015069232                 0             0.12741081
                      Macrophage_genes_ucell DC1_genes_ucell DC2_genes_ucell
C7_AAACCCAAGGGTGGGA-1             0.15504068      0.04992586      0.21424227
C7_AAACCCAAGTAGCATA-1             0.03789460      0.04074476      0.13055956
C7_AAACCCACAAGAGGTC-1             0.03843868      0.01946457      0.05147748
C7_AAACCCACAGTCTCTC-1             0.11904681      0.04764305      0.16518249
C7_AAACCCACATGAATCC-1             0.02623341      0.07395279      0.14404241
C7_AAACCCAGTAGAGGAA-1             0.05140013      0.04725391      0.19036151
                      mregDC_genes_ucell Modc_ucell Ccr7+_Activated_DC_bulkRNAseq_ucell
C7_AAACCCAAGGGTGGGA-1         0.10208759 0.16014112                          0.17001401
C7_AAACCCAAGTAGCATA-1         0.09409623 0.06489818                          0.28722689
C7_AAACCCACAAGAGGTC-1         0.04511789 0.03876742                          0.01963585
C7_AAACCCACAGTCTCTC-1         0.11730289 0.11931761                          0.12873950
C7_AAACCCACATGAATCC-1         0.35031817 0.08091461                          0.52435574
C7_AAACCCAGTAGAGGAA-1         0.11448165 0.10368703                          0.41158263
                      Effector_Cd8Tcell_bulkRNAseq_ucell Ccr7+_Activated_DC_scRNAseq_ucell
C7_AAACCCAAGGGTGGGA-1                         0.06456557                        0.25998593
C7_AAACCCAAGTAGCATA-1                         0.17755845                        0.27021971
C7_AAACCCACAAGAGGTC-1                         0.03033249                        0.09164028
C7_AAACCCACAGTCTCTC-1                         0.08172574                        0.14018677
C7_AAACCCACATGAATCC-1                         0.07802453                        0.46288656
C7_AAACCCAGTAGAGGAA-1                         0.08499078                        0.34697304
                      Effector_Cd8_Tcell_scRNAseq_ucell Ccr7_Downstream_Genes_ucell
C7_AAACCCAAGGGTGGGA-1                        0.10367573                  0.12475424
C7_AAACCCAAGTAGCATA-1                        0.21309164                  0.22006256
C7_AAACCCACAAGAGGTC-1                        0.04085264                  0.08961126
C7_AAACCCACAGTCTCTC-1                        0.11007049                  0.17004021
C7_AAACCCACATGAATCC-1                        0.05501846                  0.26731457
C7_AAACCCAGTAGAGGAA-1                        0.06435045                  0.17500000
                      pDC_genes_ucell RBC_genes_ucell Mast_cell_genes_ucell
C7_AAACCCAAGGGTGGGA-1      0.10218172               0                     0
C7_AAACCCAAGTAGCATA-1      0.01264449               0                     0
C7_AAACCCACAAGAGGTC-1      0.05391171               0                     0
C7_AAACCCACAGTCTCTC-1      0.06992938               0                     0
C7_AAACCCACATGAATCC-1      0.05345800               0                     0
C7_AAACCCAGTAGAGGAA-1      0.06280822               0                     0
                      Plasmocytoid_DC_ucell
C7_AAACCCAAGGGTGGGA-1             0.2123539
C7_AAACCCAAGTAGCATA-1             0.1831386
C7_AAACCCACAAGAGGTC-1             0.0000000
C7_AAACCCACAGTCTCTC-1             0.2313022
C7_AAACCCACATGAATCC-1             0.3580968
C7_AAACCCAGTAGAGGAA-1             0.2657763

and I would like to add a new columns with the column names of the highest *_ucell value per row. Maybe in addition setting a certain threshold, below that the cell will be considered as unclassified.

What I have at the moment is as such:

# Define a threshold for classification
threshold <- 0.1

# Select only the columns that contain UCell scores
ucell_columns <- grep("_ucell$", colnames(C7_G7_sctransform_ucell[[]]), value = TRUE)

# Find the column name with the highest score per row and apply the threshold
C7_G7_sctransform_ucell$Top_Annotation <- apply(C7_G7_sctransform_ucell[[ucell_columns]], 1, function(x) {
  max_value <- max(x)
  if (max_value < threshold) {
    return("Uncertain")
  } else {
    return(ucell_columns[which.max(x)])
  }
})

But I'm not sure, if this is the best way to do it. More important, I am not sure if this is at all a good way to annotate my clusters and would appreciate your input about that.

thanks

Clustering Seurat UCell cell-annotations Score • 792 views

ADD COMMENT • link updated 4 months ago by ATpoint 88k • written 4 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

From the same authors of Ucell, have a look at scGate.

ADD REPLY • link 4 months ago by fracarb8 ★ 1.7k

0

Entering edit mode

Yes i have found it as well. thanks

ADD REPLY • link 4 months ago by Assa Yeroslaviz ★ 1.9k

score 2 · Accepted Answer · 2025-02-28

I mean, does it make a difference, if I use 20 gene-sets to calculate scores, or just 5? Would the scores be different, if I use just five sets?

This can easily be tested in like two seconds by scoring against a set of 20 and then its subset. My guess is that it's independent, at least this is how the likes of AUCell/UCell do it.

I would like to know if it is considered good practice to used the calculated scores to assign annotations to the cells? for example by taking the highest score per cell and in add it as its annotation in a new column in the object's metadata.

Yes, that is often done. Be aware though that sanity checks should be performed. Highest doesn't mean "good". If you score blood against signatures from parenchyma then you will also get some "highest" score despite it's all nonsense results. Check that the score is reasonaby high.

How can I take for each row the highest value of these columns and assign this as the cell annotation?

I leave that to you as an experienced user to implement such a relatively trivial task. It is not all trivial though because:

be aware of ties, how do you want to resolve that?
what if it is all-zero
what if below a certain threshold (you discuss this a bit I see)

I personally set all-zeros to "NA" or "all_zero" annotation, I set as "ambiguous" if a cell has the same "highest" value for many signatures but not all-zero, and set cell identity to the highest score if not all-zero or ambiguous. Comes down to a simple "which.max" operation with some checks for aforementioned exceptions.

score 1 · Accepted Answer · 2025-02-28

0 thanks for the comments, I didn't thought about the ties, though.

I modified my code to also take these parameters into consideration, if anyone want to use something similar as well.

C7_G7_sctransform_ucell$Top_Annotation <- apply(C7_G7_sctransform_ucell[[ucell_columns]], 1, function(x) {
  max_value <- max(x)
  if (all(x == 0)) {
    return("all_zero")}
  if (max_value < threshold) {
    return("Uncertain") }

  max_indices <- which(x == max_value)
  if (length(max_indices) > 1) {
    return("Ambiguous") }

  return(ucell_columns[max_indices]) })

Though I'm still not sure, where to put the threshold at the moment.