Question

Cluster annotation in single cell

0

Entering edit mode

7 months ago

synat.keam ▴ 100

Dear Fellows,

In Single cell, once we perform a clustering, for example, "umap", which generate X number of clusters. Next is to perform annotation for cluster, which can be done by looking at differentially expressed genes within each cluster. if we get DEG within each cluster, are these DEGs the result of multiple cell comparison? I remember in bulk-RNA seq, we can only do two groups at a time using contrast? Not sure how do they compared to get DEG among several hundred cells in a cluster for single cell experiment?

Also, with integration of large dataset, the main purpose is batch correction etc. In the end, we get a single umap plot, which is the result of integration of all number of samples and conditions (control/treatment etc) from all groups. Does the display of a single "umap" mean that these cell clusters are found across samples and conditions? How could I know from a single umap that this/that group has less, for instance, fibroblast or T cell given I have cluster with with fibroblast or T cells etc. What is the point of displaying a single umap of all data set (I normally see this in publication)? Sorry I am just very confused... Looking to hear from you all.

Thanks,

Single-cell • 1.0k views

ADD COMMENT • link updated 6 months ago by e.r.zakiev ▴ 200 • written 7 months ago by synat.keam ▴ 100

2

Entering edit mode

You need to do through these tutorials which will help you a lot.

Single-cell best practices

OSCA

ADD REPLY • link 7 months ago by bk11 ★ 2.4k

1

Entering edit mode

I'd really recommend finding a local scRNA-seq expert to talk to at your institution if available. These questions are really beyond the scope of this site and will require lengthy and detailed answers.

ADD REPLY • link 7 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Are you using Seurat?

ADD REPLY • link 7 months ago by Ram 43k

0

Entering edit mode

Thanks, I'm using seurat and also tried to learn from Bioconductor book. could you help explain me. I am just very confused and did not progress at all

Regards,

ADD REPLY • link 7 months ago by synat.keam ▴ 100

0

Entering edit mode

I asked that question because it's relevant to your post. I cannot guide you on such a broad topic. Use the links bk11 has provided you to learn more.

ADD REPLY • link 7 months ago by Ram 43k

score 0 · Answer 1 · 2023-10-17

Clustering (for example in Seurat's pipeline) is usually done based on PCA embedding, not UMAP, as the former conserves the euclidian distances between the cells in the multidimensional expression space and the latter is somewhat stochastic by definition.

The DEGs can be found with a very nice package called presto and as an added benefit it doesn't assume any distribution of your data as it uses nonparametric (i.e. rank-based) statistical testing.