clustering cell population in different phenotypes
0
0
Entering edit mode
6.9 years ago

I have gene expression data of 4 phenotypic markers in 8 cell lines. Each cell line has 6 biological replicates. I have to cluster these 8 cell lines in 5 different cell population/phenotypes based on the expression data.

Since I don't have training data set, I used K-mean clustering to cluster these cell lines in 5 fixed clusters (phenotypes). What I don't understand is - how do I deal with replicates (should I take the mean/geometric mean to combine all the replicates?) I also tried clustering cell-lines (along with replicates). But, that resulted in the distribution of same cell line replicates into different clusters.

Can anyone help?

Thank you.

R gene rna-seq MATLAB K-means Clustering • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi,

additional question: expression of how many genes do you have?

The reason I am asking is this: if you have expression data of many genes it makes sometimes sense to preselect genes before K-means clustering (Feature selection)

ADD REPLY
0
Entering edit mode

These 4 - phenotypic markers are genes. That means I have expression data for 4 genes.

ADD REPLY
0
Entering edit mode

Personally, I would like to use all biological replicates.

On a side note: have you run K-means clustering multiple times since it is a randomized algorithm?

ADD REPLY
0
Entering edit mode

So if I understand correctly, you have a 48 rows (samples) x 4 columns (genes) data matrix and you want to cluster the rows into 5 clusters. Depending on what you believe (or assumptions you can make) about the replicates, you can combine them into one vector either by averaging or by taking the median of each gene. Regardless, you should first plot the data to see if there are 5 easily distinguishable clusters and what shape they have. K-means is only good for finding roughly spheroid-shaped clusters. If you have a mixture of ball-like and elongated clusters for example, k-means will most likely fail. If you don't see clusters in the original space, then try in PCA space or with multidimensional scaling.

ADD REPLY

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6