Question: Clustering Cells using known marker genes (Single cell RNAseq)
gravatar for V
3.3 years ago by
V230 wrote:


I've got my a dataset of single cells that were sequenced and generated the associated count files etc.

Up until know i've been using the Seurat package in R that is amazing at clustering cells itself (unsupervised), and it will then give you the genes differentially expressed between clusters A vs B etc.

One question I have though and can't find the solution is how to do "supervised" clustering (??) Basically I've got these cells that are for example Pax3+/CD146+ and these other cells that are Pax3+/CD146-. And these cells using the tSNE plot I can see fall in different clusters when I do conventional clustering (together with other unrelated cells). Does anyone know of a way that I can cluster all of the cells I want together in two different clusters (Pax3+ / CD146- & +) and then run differential expression testing (or even just get the gene lists) of those?


single cell rnaseq • 2.9k views
ADD COMMENTlink modified 2.7 years ago by PR20 • written 3.3 years ago by V230
gravatar for Jean-Karim Heriche
3.3 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

Supervised "clustering" is usually called classification in machine learning while the term clustering is typically reserved for unsupervised approaches. Maybe this clarification of terms will help you find the relevant literature.
The way to do supervised learning, is to use a training set, i.e. a data set for which you know the ground truth and use it to "train" an algorithm to learn how to classify the data. How to do this more precisely depends on the type of data you have and the algorithm you want to use. There are a few things to pay attention to. For example, if you only train with two classes, all samples will end up into one or the other. If the "unrelated cells" are the problem you would need more classes and have ground truth data for all of them.
If getting ground truth data is an issue, you could also try refining the clustering (maybe using other clustering methods) so that you get more clusters that are "purer".

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Jean-Karim Heriche23k

Thank you for clarifying the terminology. Sometimes having the correct term can save hours of random searching!

ADD REPLYlink written 3.3 years ago by V230
gravatar for PR
2.7 years ago by
PR20 wrote:

Not sure if my answer will still be useful to you, as the post is pretty old. I just bumped onto your question now. I'm guessing you are using the FindMarkers function in Seurat to call DE. You can make this function call differential expression between any two subgroups of cells by first assigning new subgroup identifiers to the cells using the AddMetaData function, and then formally making those subgroup identifiers the default cell "idents" using the "SetAllIdent" function. Then, you can assign the new "idents" to "ident.1" and "ident.2" parameters in the FindMarkers function. Hope this works! If you have found another way, please post that too as a reply. Good luck.

ADD COMMENTlink written 2.7 years ago by PR20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1010 users visited in the last hour