Question: Clustering Cells using known marker genes (Single cell RNAseq)
1
gravatar for V
21 months ago by
V100
UK/London
V100 wrote:

Hello,

I've got my a dataset of single cells that were sequenced and generated the associated count files etc.

Up until know i've been using the Seurat package in R that is amazing at clustering cells itself (unsupervised), and it will then give you the genes differentially expressed between clusters A vs B etc.

One question I have though and can't find the solution is how to do "supervised" clustering (??) Basically I've got these cells that are for example Pax3+/CD146+ and these other cells that are Pax3+/CD146-. And these cells using the tSNE plot I can see fall in different clusters when I do conventional clustering (together with other unrelated cells). Does anyone know of a way that I can cluster all of the cells I want together in two different clusters (Pax3+ / CD146- & +) and then run differential expression testing (or even just get the gene lists) of those?

Thanks!

single cell rnaseq • 1.9k views
ADD COMMENTlink modified 14 months ago by PR10 • written 21 months ago by V100
4

Supervised "clustering" is usually called classification in machine learning while the term clustering is typically reserved for unsupervised approaches. Maybe this clarification of terms will help you find the relevant literature.
The way to do supervised learning, is to use a training set, i.e. a data set for which you know the ground truth and use it to "train" an algorithm to learn how to classify the data. How to do this more precisely depends on the type of data you have and the algorithm you want to use. There are a few things to pay attention to. For example, if you only train with two classes, all samples will end up into one or the other. If the "unrelated cells" are the problem you would need more classes and have ground truth data for all of them.
If getting ground truth data is an issue, you could also try refining the clustering (maybe using other clustering methods) so that you get more clusters that are "purer".

ADD REPLYlink modified 21 months ago • written 21 months ago by Jean-Karim Heriche18k
1

Thank you for clarifying the terminology. Sometimes having the correct term can save hours of random searching!

ADD REPLYlink written 21 months ago by V100
1
gravatar for PR
14 months ago by
PR10
PR10 wrote:

Not sure if my answer will still be useful to you, as the post is pretty old. I just bumped onto your question now. I'm guessing you are using the FindMarkers function in Seurat to call DE. You can make this function call differential expression between any two subgroups of cells by first assigning new subgroup identifiers to the cells using the AddMetaData function, and then formally making those subgroup identifiers the default cell "idents" using the "SetAllIdent" function. Then, you can assign the new "idents" to "ident.1" and "ident.2" parameters in the FindMarkers function. Hope this works! If you have found another way, please post that too as a reply. Good luck.

ADD COMMENTlink written 14 months ago by PR10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1881 users visited in the last hour