Question: Cluster analysis for hypothesis veryfication
gravatar for krzysztof.szade
4.4 years ago by
krzysztof.szade0 wrote:


I am new in the field of cluster analysis of the RNAseq data and would like to ask more experienced guys if the strategy I plan to apply makes any sense. 

The hypothesis I want to verify states: 
The sorted cells from young knock-out (KO) mice have transcriptome profile resembling the cell from old wild-type (WT) mice. 
In other words, I suppose that KO cells show premature aging. 

Experiment scheme:
I did RNAseq with 4 groups. 4 samples/group :
1. WT - young
2. WT - old
3. KO - young
4. KO - old 
Differential expression done by DEseq2. 

Now, does it make sense to use supervised sample-clustering to verify the hypothesis? 
I am thinking to select the "classifier genes" by comparing WT young and WT old - to select marker genes for aged phenotyped. I have around 1000 genes significantly changed by DESeq2. 
Then, based on this "classifier genes" I want to make supervised clustering (k-mean?) of all WT and KO samples, to see if young KO cell cluster together with old WT cells. 
Can I verify my hypothesis by this strategy? If yes, what tools (R packages or other programs) for supervised clustering do you recommend? 

I would be grateful for any help nad tips. 


ADD COMMENTlink modified 4.4 years ago by Jean-Karim Heriche22k • written 4.4 years ago by krzysztof.szade0
gravatar for Jean-Karim Heriche
4.4 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche22k wrote:

Clustering is an unsupervised approach because you do not make use of class labels (e.g. young/old) to group your samples. To make sure I understand: you represent each sample by a vector of expression levels of some marker genes then compute some similarity measure between samples and apply a clustering algorithm. Because it's unsupervised, if you're lucky the main structure that will be picked up is what you want but the devil is in the details. I would first try computing a cosine similarity and perform hierarchical clustering to see if the samples cluster as expected. I tend to compare results from complete and average linkages to see if they agree because when they do not, it usually indicates the structure is not very robust. An R package for this is hclust.

However, given your small sample size, why not simply compute all pairwise similarity and make a judgement call i.e. decide if the similarity between KO-young and WT-old is sufficiently higher than that between KO-young and WT-young ?

ADD COMMENTlink written 4.4 years ago by Jean-Karim Heriche22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1192 users visited in the last hour