I am new in the field of cluster analysis of the RNAseq data and would like to ask more experienced guys if the strategy I plan to apply makes any sense.
The hypothesis I want to verify states:
The sorted cells from young knock-out (KO) mice have transcriptome profile resembling the cell from old wild-type (WT) mice.
In other words, I suppose that KO cells show premature aging.
I did RNAseq with 4 groups. 4 samples/group :
1. WT - young
2. WT - old
3. KO - young
4. KO - old
Differential expression done by DEseq2.
Now, does it make sense to use supervised sample-clustering to verify the hypothesis?
I am thinking to select the "classifier genes" by comparing WT young and WT old - to select marker genes for aged phenotyped. I have around 1000 genes significantly changed by DESeq2.
Then, based on this "classifier genes" I want to make supervised clustering (k-mean?) of all WT and KO samples, to see if young KO cell cluster together with old WT cells.
Can I verify my hypothesis by this strategy? If yes, what tools (R packages or other programs) for supervised clustering do you recommend?
I would be grateful for any help nad tips.
Clustering is an unsupervised approach because you do not make use of class labels (e.g. young/old) to group your samples. To make sure I understand: you represent each sample by a vector of expression levels of some marker genes then compute some similarity measure between samples and apply a clustering algorithm. Because it's unsupervised, if you're lucky the main structure that will be picked up is what you want but the devil is in the details. I would first try computing a cosine similarity and perform hierarchical clustering to see if the samples cluster as expected. I tend to compare results from complete and average linkages to see if they agree because when they do not, it usually indicates the structure is not very robust. An R package for this is hclust.
However, given your small sample size, why not simply compute all pairwise similarity and make a judgement call i.e. decide if the similarity between KO-young and WT-old is sufficiently higher than that between KO-young and WT-young ?