Cluster analysis for hypothesis veryfication
1
0
Entering edit mode
8.3 years ago

Hi,

I am new in the field of cluster analysis of the RNAseq data and would like to ask more experienced guys if the strategy I plan to apply makes any sense.

The hypothesis I want to verify states: The sorted cells from young knock-out (KO) mice have transcriptome profile resembling the cell from old wild-type (WT) mice.

In other words, I suppose that KO cells show premature aging.

Experiment scheme:

I did RNAseq with 4 groups. 4 samples/group:

  1. WT - young
  2. WT - old
  3. KO - young
  4. KO - old

Differential expression done by DEseq2.

Now, does it make sense to use supervised sample-clustering to verify the hypothesis?

I am thinking to select the "classifier genes" by comparing WT young and WT old - to select marker genes for aged phenotyped. I have around 1000 genes significantly changed by DESeq2.

Then, based on this "classifier genes" I want to make supervised clustering (k-mean?) of all WT and KO samples, to see if young KO cell cluster together with old WT cells.

Can I verify my hypothesis by this strategy? If yes, what tools (R packages or other programs) for supervised clustering do you recommend?

I would be grateful for any help and tips.

Best,
Krzysiek

RNA-Seq supervised-cluster-analysis clustering • 1.8k views
ADD COMMENT
0
Entering edit mode
8.3 years ago

Clustering is an unsupervised approach because you do not make use of class labels (e.g. young/old) to group your samples. To make sure I understand: you represent each sample by a vector of expression levels of some marker genes then compute some similarity measure between samples and apply a clustering algorithm. Because it's unsupervised, if you're lucky the main structure that will be picked up is what you want but the devil is in the details. I would first try computing a cosine similarity and perform hierarchical clustering to see if the samples cluster as expected. I tend to compare results from complete and average linkages to see if they agree because when they do not, it usually indicates the structure is not very robust. An R package for this is hclust.

However, given your small sample size, why not simply compute all pairwise similarity and make a judgement call i.e. decide if the similarity between KO-young and WT-old is sufficiently higher than that between KO-young and WT-young ?

ADD COMMENT

Login before adding your answer.

Traffic: 2893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6