scRNA-seq: Which is the best method for computing the % of similarity between two clusters using R?
1
0
Entering edit mode
4 days ago
Ondina ▴ 30

Hello everyone, I'm into a singlecell RNA-seq analysis using Seurat, I've generated the UMAPs, PCA graphs, worked on the differential expression between the clusters ... And I'm looking at the PCA which is the graphical representation of how much the clusters are similar and I was asking myself the following question: Is there a simple way to compute the similarity rate between two clusters to get a response like cluster 1 is 80% similar to cluster 2?

I was thinking of two ways of doing this:

1. Using the data of the PCA graph which are coordinates
2. Using the differential expression data (by using FindMarkers)
RNAseq singlecell clustering • 482 views
3
Entering edit mode
4 days ago

I'd probably just do a distance matrix of the variable features after making pseudobulks for each cluster. You could do it for individual cells, but it'd take a lot longer and likely result in a similar output. You could then get distance/correlation values between them if you really wanted a numerical output, though interpretation may still be rather tedious to explain.

0
Entering edit mode

Hi, would the pseudobulk approach be just to reduce the distances that one has to calculate?

I've been hearing a lot about this pseudobulk idea, but it's still not super clear to me when it should be used

2
Entering edit mode

Mostly, yes. It will also reign in outliers. You can take this approach with all cells, and it'll probably work fine, but it may take a good while to run and will require a lot more memory. In addition, the output may not be as clean due to some cells clustering with cells of other similar clusters, etc.

Point being, if you want to compare the clusters, you might as well compare the clusters rather than their constituent elements. Pseudobulking is useful in DE for certain single cell analyses, as the single-cell specific approaches tend to return a lot of false positives due to the sparsity of the data. Pseudobulking makes the use of bulk RNA-seq testing methods appropriate, and they tend to return more robust results for inter-condition contrasts. You can read more about it in this OSCA chapter.

0
Entering edit mode

Okay thank you for your response, I'll try using the distance matrix issued of the pseudobulking process, I didn't thought of using it like this.

Traffic: 2212 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.