Question

Single cell cluster annotation and validation tools: recommendations?

3

Entering edit mode

4.7 years ago

A248 ▴ 30

Hi, I am working on a single-cell dataset, would you have any recommendations for tools/algos for the following two aspects

After dimensionality reduction and cell clustering, we have annotated clusters based on general understanding of immunology (e.g. B cells annotated as such because they express MS4A1, CD79a strongly etc). But is there a way to compare quantitatively how well the transcriptional signatures of populations in the published literature (say of B cells, taken from bulk RNA seq) matches our cluster annotation as B cells? What would be the best algorithm for this "scoring"?
We have carried out bulk RNA sequencing of certain sub-populations purified by flow cytometry. We have a list of top DEGs expressed by these purified populations. Is there a way to highlight the cells which express these DEGs on the single-cell dataset? (Using Seurat, we can light up cells expressing one gene. But how would we do the same for a list of genes?)

Many thanks for any suggestions!

single cell sequencing sc-RNA RNA-Seq • 3.4k views

ADD COMMENT • link updated 4.7 years ago by jared.andrews07 ★ 17k • written 4.7 years ago by A248 ▴ 30

score 6 · Answer 1 · 2019-11-20

6

Entering edit mode

4.7 years ago

jared.andrews07 ★ 17k

SingleR is perfect for your first problem (full disclosure: I've been involved in its development). It can annotate your cell types based on purified bulk RNA-seq or single-cell datasets, eliminating the need for you to manually annotate based on a handful of marker genes for each cell type. It works quite well, is workflow agnostic, is very fast, and has a number of diagnostic visuals to help you determine whether to believe the assigned labels. It can annotate both individual cells and clusters and has several reference datasets (particularly of the immune variety) built-in.
What I usually do in this case is to summarize the score for a list of genes and display that. The AddModuleScore function in Seurat will do that for you, you just need to feed it a list of genes.

ADD COMMENT • link 4.7 years ago by jared.andrews07 ★ 17k

2

Entering edit mode

Hello, I have scRNA seq data with Gene Ensemble ids, I don't know how to force SingleR to use gene ids instead of gene names to annotate the cell types. any recommendations would be appreciated.

ADD REPLY • link 3.7 years ago by Maria17 ▴ 40

2

Entering edit mode

If you're using any of the built-in reference datasets, you can feed ensembl=TRUE to the function when you retrieve them to use ensemble IDs rather than gene symbols. You'll likely lose a handful of genes as the mapping is never perfect, but it generally works well.

ADD REPLY • link 3.7 years ago by jared.andrews07 ★ 17k