Question

Cluster analysis for hypothesis veryfication

0

Entering edit mode

8.3 years ago

krzysztof.szade • 0

Hi,

I am new in the field of cluster analysis of the RNAseq data and would like to ask more experienced guys if the strategy I plan to apply makes any sense.

The hypothesis I want to verify states: The sorted cells from young knock-out (KO) mice have transcriptome profile resembling the cell from old wild-type (WT) mice.

In other words, I suppose that KO cells show premature aging.

Experiment scheme:

I did RNAseq with 4 groups. 4 samples/group:

WT - young
WT - old
KO - young
KO - old

Differential expression done by DEseq2.

Now, does it make sense to use supervised sample-clustering to verify the hypothesis?

I am thinking to select the "classifier genes" by comparing WT young and WT old - to select marker genes for aged phenotyped. I have around 1000 genes significantly changed by DESeq2.

Then, based on this "classifier genes" I want to make supervised clustering (k-mean?) of all WT and KO samples, to see if young KO cell cluster together with old WT cells.

Can I verify my hypothesis by this strategy? If yes, what tools (R packages or other programs) for supervised clustering do you recommend?

I would be grateful for any help and tips.

Best,
Krzysiek

RNA-Seq supervised-cluster-analysis clustering • 1.8k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.3 years ago by krzysztof.szade • 0

score 0 · Answer 1 · 2016-01-07

Clustering is an unsupervised approach because you do not make use of class labels (e.g. young/old) to group your samples. To make sure I understand: you represent each sample by a vector of expression levels of some marker genes then compute some similarity measure between samples and apply a clustering algorithm. Because it's unsupervised, if you're lucky the main structure that will be picked up is what you want but the devil is in the details. I would first try computing a cosine similarity and perform hierarchical clustering to see if the samples cluster as expected. I tend to compare results from complete and average linkages to see if they agree because when they do not, it usually indicates the structure is not very robust. An R package for this is hclust.

However, given your small sample size, why not simply compute all pairwise similarity and make a judgement call i.e. decide if the similarity between KO-young and WT-old is sufficiently higher than that between KO-young and WT-young ?