Question

scRNA-seq: Which is the best method for computing the % of similarity between two clusters using R?

0

Entering edit mode

2.3 years ago

Ondina ▴ 100

Hello everyone, I'm into a singlecell RNA-seq analysis using Seurat, I've generated the UMAPs, PCA graphs, worked on the differential expression between the clusters ... And I'm looking at the PCA which is the graphical representation of how much the clusters are similar and I was asking myself the following question: Is there a simple way to compute the similarity rate between two clusters to get a response like cluster 1 is 80% similar to cluster 2?

I was thinking of two ways of doing this:

Using the data of the PCA graph which are coordinates
Using the differential expression data (by using FindMarkers)

RNAseq singlecell clustering • 2.2k views

ADD COMMENT • link updated 20 months ago by ahmad mousavi ▴ 800 • written 2.3 years ago by Ondina ▴ 100

score 6 · Answer 1 · 2022-01-14

6

Entering edit mode

2.3 years ago

jared.andrews07 ★ 16k

I'd probably just do a distance matrix of the variable features after making pseudobulks for each cluster. You could do it for individual cells, but it'd take a lot longer and likely result in a similar output. You could then get distance/correlation values between them if you really wanted a numerical output, though interpretation may still be rather tedious to explain.

ADD COMMENT • link 2.3 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Hi, would the pseudobulk approach be just to reduce the distances that one has to calculate?

I've been hearing a lot about this pseudobulk idea, but it's still not super clear to me when it should be used

ADD REPLY • link 2.3 years ago by hamarillo ▴ 70

5

Entering edit mode

Mostly, yes. It will also reign in outliers. You can take this approach with all cells, and it'll probably work fine, but it may take a good while to run and will require a lot more memory. In addition, the output may not be as clean due to some cells clustering with cells of other similar clusters, etc.

Point being, if you want to compare the clusters, you might as well compare the clusters rather than their constituent elements. Pseudobulking is useful in DE for certain single cell analyses, as the single-cell specific approaches tend to return a lot of false positives due to the sparsity of the data. Pseudobulking makes the use of bulk RNA-seq testing methods appropriate, and they tend to return more robust results for inter-condition contrasts. You can read more about it in this OSCA chapter.

ADD REPLY • link 2.3 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Okay thank you for your response, I'll try using the distance matrix issued of the pseudobulking process, I didn't thought of using it like this.

ADD REPLY • link 2.2 years ago by Ondina ▴ 100

score 1 · Answer 2 · 2022-08-24

1

Entering edit mode

20 months ago

ahmad mousavi ▴ 800

Hi

I think PCA or in overall dimension reduction should work for such scenario, but reaching the similarity ratio must be challenging.

Also, you can try Integration methods, they try to find same pattern in each dataset based on a reference and query.

ADD COMMENT • link 20 months ago by ahmad mousavi ▴ 800