How do I use FindAllMarkers after making additional cluster assignments?
1
0
Entering edit mode
21 months ago
Pratik ▴ 840

Hey ya'll,

This question is a follow-up question after the solution from here: A: How to use FindSubCluster in Seurat?

After subclustering using FindSubCluster, how do I FindAllMarkers using the additional cluster assignments on the whole SeuratObject? The cluster I subcluster is skipped over during FindAllMarkers for some reason? Any help would be appreciated. I think FindSubCluster is new to Seurat v4.0.

Any help would be appreciated.

> scfp <- FindNeighbors(scfp, graph.name = "test", dims = 1:100)
Computing nearest neighbor graph
Computing SNN
> scfp <- FindClusters(scfp, graph.name = "test", resolution = 2, algorithm = 1, verbose = TRUE)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1836
Number of edges: 16978

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7368
Number of communities: 24
Elapsed time: 0 seconds
4 singletons identified. 20 final clusters.
> #scfp <- RunUMAP(scfp, dims = 1:100)
> scfp <- RunTSNE(scfp, dims = 1:100)
> #DimPlot(scfp, reduction = "umap", label = TRUE)
> DimPlot(scfp, reduction = "tsne", label = TRUE, label.size = 6 )
> scfp <- FindSubCluster(scfp, "6", "test", subcluster.name = "blood",  resolution = .3, algorithm = 1)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 104
Number of edges: 819

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7301
Number of communities: 3
Elapsed time: 0 seconds
> DimPlot(scfp, reduction = "tsne", group.by = "blood", label = TRUE, label.size = 6)
> scfp.markers <- FindAllMarkers(scfp, graph.name = "test", group.by = "blood", only.pos = TRUE, min.pct = 0.1, logfc.threshold = 0.25)
Calculating cluster 0
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=08s
Calculating cluster 1
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s
Calculating cluster 2
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=07s
Calculating cluster 3
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=18s
Calculating cluster 4
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=16s
Calculating cluster 5
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=07s
Calculating cluster 6
Calculating cluster 7
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 8
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=17s
Calculating cluster 9
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=17s
Calculating cluster 10
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=24s
Calculating cluster 11
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 12
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s
Calculating cluster 13
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=18s
Calculating cluster 14
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=12s
Calculating cluster 15
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 16
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=13s
Calculating cluster 17
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
Calculating cluster 18
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=17s
Calculating cluster 19
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=14s
> View(scfp.markers)
Seurat R single cell-RNA seq scRNA-seq • 1.3k views
2
Entering edit mode
21 months ago
Pratik ▴ 840

So this was solved with the help of @nbpeterson3 on the Seurat github Discussion page.

So when FindSubClusters is done, it stores the subcluster in the object@metadata. Basically you have to use the SetIdent function to change the cluster assignments to your new cluster assignments.

Here's a brief tutorial using the example above:

##So in the FindNeighbors you have to set the graph.name argument to something of your choice
> scfp <- FindNeighbors(scfp, graph.name = "test", dims = 1:100)

##You have to do the same for FindClusters as well (set the graph.name argument to the same as above)
> scfp <- FindClusters(scfp, graph.name = "test", resolution = 2, algorithm = 1, verbose = TRUE)
> scfp <- RunTSNE(scfp, dims = 1:100)

##Visualize your plot to see which cluster you want to subcluster
> DimPlot(scfp, reduction = "tsne", label = TRUE, label.size = 6 )

##Now use that cluster number as part of the syntax in FindSubCluster. I chose cluster 6. And then after put your graph.name in quotes. and then choose something for subcluster.name.  Play around with resolution.
> scfp <- FindSubCluster(scfp, "6", "test", subcluster.name = "blood",  resolution = .3, algorithm = 1)

##Your subcluster is now saved in metadata. In this example it's location is "scfp@meta.data$blood". Now use SetIdent to save your new cluster assignment to the main Ident in your object. > scfp <- SetIdent(scfp, value = scfp@meta.data$blood)

##Visualize the main object without choosing meta data to see if it worked.
> DimPlot(scfp, reduction = "tsne", label = TRUE, label.size = 6)

##Use FindAllMarkers on your main Seurat Object
> scfp.markers <- FindAllMarkers(scfp, only.pos = TRUE, min.pct = 0.1, logfc.threshold = 0.25)

##Continue on with analysis.
1
Entering edit mode

I don't see it in your code, but you should also set the default assay to RNA, as the DE analysis is done on the raw counts. DefaultAssay(scfp) <- "RNA". Or if you don't want to change the assay globally, FindAllMarkers(.., assay = "RNA,..)

0
Entering edit mode

Thank you for responding @fracarb8.

Is this a standard in the field to do DE analysis on raw counts? I saw this mentioned on the biostars slack channel as well, where someone suggested using counts for DESeq2 analysis, not sure if it's the same. Could you share, why, if you know? In the Seurat tutorial, it shows doing the analysis on the log normalized and scaled data. However, it does touch on how to use "counts" in the analysis, if one desires. I would appreciate your insight.

Kindly, Pratik

1
Entering edit mode

You use counts mainly to assure that data you are using is independent between each sample. Packages like DESeq2 implement their own way of normalisation, so you don't want to feed them normalise/transformed counts. On top of that, your normalisation depends on the condition you are testing. Imagine if you have 20 samples, but only interested in the differences between sample4 (id1) and sample18 (id2). You don't care about the effects of the other samples, so a "global" normalisation can be misleading. Regarding seurat things are a bit confusing (at least to me :D), but the general consensus would be to 1) integrate, 2) SetDefaultAssay() <- "RNA",3) normalise your counts if you used SCT, and then 4) perform DE analysis. You should have a look at the issue section of the official git page, as it is filled with such questions.