PCA-based clustering gene expression data
1
0
Entering edit mode
4.8 years ago
Gene_MMP8 ▴ 240

I have the gene expression data and mortality labels for a group of patients. I have performed differential gene expression analysis and found the top 40 most differentially expressed genes (based on adjusted pvalue). Now I have extracted the gene expression data for these top 40 genes and ran a PCA on it. The PCA results were further clustered using kmeans in R. I am getting two distinct clusters. The details are below:
Dataset description
Total patients: 275
Number of dead patients: 52
Number alive: 223
Clustering results on PCA reduced dataset for the top 40 most differentially expressed genes
Cluster 1: Alive 136 and Dead 1
Cluster 2: Alive: 87 and Dead 51
So my cluster 1 is enriched in patients who are alive. This is good (according to me). What else can I say from this analysis? One conclusion is that top 40 genes' expression data can differentiate between alive and dead patients in the dataset. Now how well does it differentiate? Is there any metric I can attach to these results?

RNA-Seq R • 960 views
ADD COMMENT
0
Entering edit mode

How many dimensions of the PCA are you using for clustering?

ADD REPLY
0
Entering edit mode

I am using the first three dimensions of the PCA

ADD REPLY
5
Entering edit mode
4.8 years ago

You should qualify your statements / result with:

  1. Chi-squared test to check relative proportions of alive | dead in each cluster
  2. multivariate binary logistic regression model with all 40 genes as predictors and alive|dead as end-point. This would be followed by cross validation of the model and then ROC analysis. Prior to cross validation and ROC, you could aim to reduce the model parameters via stepwise regression or manual inspection of the errors, residuals, p-values, etc.
  3. hierarchical clustering using just the 40 genes for the purpose of gauging separation of samples

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6