Alternative for unsupervised clustering in scRNA Data
0
0
Entering edit mode
6.2 years ago
John ▴ 270

Hi dear one, As far I have seen in literatures all the data of single cell RNA seq have been subgrouped by unsupervised clustering by using top 100/200/300 variable genes! Is there any way to do subgrouping single cell by the known genes which is specific to cell type? This could be more accurate (subgrouping) right?

Thanks in advance!

RNA-Seq unsupervised clustering gene expression • 1.9k views
ADD COMMENT
1
Entering edit mode

Well, just for the scRNA community, when one chooses the top 100/200/300 genes based on variance and then performs clustering, this is not unsupervised at all - it is supervised clustering based on highly variable genes.

When you say "specific to cell-type", do you mean that you want to relate your scRNA data to tissue-specific data so that you could, for example, segregate your scRNA population by tissue based on their different expression patterns? There was a recent question posted on this, here: Normalizing transcriptome data by tissue type

Other clustering methods include k-means, PAM, t-SNE, etc.

ADD REPLY
0
Entering edit mode

Dear Kevin,

The same thing you mentioned as supervised clustering is written as unsupervised clustering in the following paper

statement: "The “autoAnalysis()” command was used to perform unsupervised clustering, principal component analysis, and expression heat mapping of the remaining 64 cells using the top 400 most variable genes as deter- mined by ANOVA"

ARTICLE : Integrative Single-Cell Transcriptomics Reveals Molecular Networks Defining Neuronal Maturation During Postnatal Neurogenesis

can you please help me by differentiating unsupervised vs supervised clustering?

thanks in advance

ADD REPLY
0
Entering edit mode

Depends on your perspective, but for me that is not unsupervised clustering:

  • They look at their original dataset
  • They decide to filter out genes based on high/low variance for whatever reason
  • They perform hierarchical clustering using the highly-variable genes

Therefore, the clustering is biased due to the fact that it is being generated from a set of highly variable genes that will segregate better the sample cohort. If it were entirely unbiased, then they would have performed the clustering on all genes that passed QC.

That said, they may have used the term unbiased in the sense that the clustering was performed on a hypothesis free basis. Even still, the clustering is biased by only using highly variable gene - it's a neat trick to segregate better your cohort.

Also, forgive me, I would be wary of using a function called autoAnalysis(). We need less automation and more human brains looking over things.

ADD REPLY

Login before adding your answer.

Traffic: 1506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6