Question

How to run GSVA on scRNA-Seq analysis on multiple clusters and groups to compare per tumor sample?

0

Entering edit mode

12 days ago

Minh-Anh ▴ 10

Hello ,

I’m working on a single-cell RNA-seq dataset where I want to compare pathway activity across different cell groups (e.g., tumor, immune, other) within each tumor sample. These cell groups each include multiple clusters (e.g., macrophages, NK cells for the immune group). I would like to use GSVA or ssGSEA for pathway analysis. I am not sure which algorithm is suitable.

I’ve tried two approaches:

Run GSVA on the full matrix using GetAssayData(seurat_obj, slot = "scale.data") then perform grouping in post-analysis visualization with a heatmap.
Aggregate expression using AggregateExpression() by orig.ident and cell_group, then run GSVA on the resulting matrix, but I’m unsure if it's the correct approach.

Edit: I have not been able to perform the first method, but I have been able to perform the second. I am still not sure which one is more appropriate. Isn't performing pseudo bulking defeat the purpose of scRNASeq, since you return to a bulk-like state?

Previously, I have used the ssGSEA parameter on the whole matrix obtained from GetAssay function, however as I understand it, the algorithm will perform the analysis on every single cell of the matrix. The analysis was terminated because it hasn't finished after nearly one week of processing. I am sure I am doing something wrong but I am not sure why. This method was inspired by this GitHub response suggesting the GetAssay option.

The biological question is this: what are the different pathways expressed inside each cell groups (tumor, immune, other) where each cell group contains multiple clusters of celltypes (macrophages, NK cells for immune group for example), for each tumor sample? These cell groups are chosen based on their previous cluster identification.

Because online guides on GSVA pertaining to single-cell are so scarce, I am not sure whether it is appropriate to make our cell groups before or after the GSVA analysis and whether or not to input the matrix of pseudo-bulk genes obtained from the AggregateExpression function, or the raw scaled.data assay situated which is normalized and scaled previously in a previous Seurat process, obtaining it from the GetAssayData(layer = "scale.data") function.

Here the relevant code:

object <- GetAssayData(object = seurat_obj, assay = "RNA", layer = "scale.data")
# ssgsea_object <- ssgseaParam(object, geneSets = gene_sets)

gsva_object <- gsvaParam(object, geneSets = gene_sets)
gsva_results <- GSVA::gsva(param = gsva_object)

Another alternative is using the matrix obtained from pseudobulk using AggregateExpression such as:

 avg_exp_group <- AggregateExpression(seurat_obj, group.by = c("orig.ident", "cell_group"))
 gsva_object <- gsvaParam(avg_exp_group$RNA, geneSets = gene_sets)
 gsva_results <- GSVA::gsva(param = gsva_object, expr = object, gset.idx.list = gene_sets)

I am not sure which way is the correct way or if there is a methodological misstep in any of those since the official GSVA vignette doesn't really mention anything about scRNA-Seq. Asking chatGPT ironically gives two different responses on two different computers, one suggesting the first and another the second approach. Also just to be sure, should I use the counts or the scaled.data in layers?

For the list of genes:

# HALLMARK: HYPOXIA
hallmark <- msigdbr(species = species, collection = "H")
hallmark_hypoxia <- hallmark %>% filter(gs_name == "HALLMARK_HYPOXIA")
[...]
# Assuming I have multiple gene sets from msigdbr
gene_sets <- list(
HALLMARK_HYPOXIA = unique(hallmark_hypoxia$gene_symbol),
HALLMARK_GLYCOLYSIS = unique(hallmark_glycolysis$gene_symbol)
)

Many thanks in advance, Minh-Anh

Edit2: It turns out I have wrongly taken inspiration from the GitHub code:

 GSVA::gsva(expr = object, gset.idx.list = geneset, ...)

The function arguments have changed, thus the arguments were inappropriate and the function never resolved. I have now been able to perform both methods. However, I would still like to ask which method is more appropriate, inputting the full matrix (from data or scaled.data) or the pseudobulk matrix into the gsva function, and if gsva or ssgsea is more appropriate.

GSVA scRNA-Seq • 311 views

ADD COMMENT • link 11 days ago by Minh-Anh ▴ 10