Question

Question About Selecting Genes From A Subcluster Of Hc

1

Entering edit mode

10.6 years ago

Nitin ▴ 30

My question is little bit of biology / translation based but related to bioinformatics analysis. I have an expression dataset where sample number is limited no replicates in the experiment there are 4 cell lines one is primary and rest of the two are differentiating different lineage. I recognize the importance of replicates and limitations of such data and reliability. Now based on differences in expression in two sample analysis I selected around 1500 genes which showed a specified threshold cut off value. When I used Hierarchical clustering (Average linkage and Euclidean distance) it give me three clusters of samples which make sense from biological point of view. However there are several genes which are not perfectly discriminatory between two or three samples. Therefor I want to select a subset of genes which clearly discriminate three grps. From same clustering, when I looked at the clusters of genes it gave me 5 clusters when I cut these clusters I can get one cluster which perfectly discriminate between three grps. I again re-cluster that subset of data (one cluster of genes identified from 1500 genes) and it gave me meaningful results. Now I understand that I am not basing my interpretation on stat. I took that subset of data (one cluster) and used PCA plot and K means clustering which all mirror with same separation of three clusters. The question which I have - Is this a reasonable approach even though qualitative? Has any one aware of selection of genes from sub clusters like this? I tried to search but could not find publication per se.

Thanks for your help.

selection • 2.2k views

ADD COMMENT • link updated 10.6 years ago by Sean Davis 26k • written 10.6 years ago by Nitin ▴ 30

score 0 · Answer 1 · 2013-10-01

Working without replicates can be a useful exercise as long as you treat it as hypothesis-generating. That is, whatever you think you have learned about gene expression and its relationship to biology in your system will need to be validated using more samples, some biological manipulation in a model system, replication using another technology, or observed in another analogous dataset (eg., human disease). In your case, you have a set of genes that you believe is biologically important based on an ad hoc approach. You will need to ask yourself how you are going to go about truly testing your hypothesis that these genes ARE important, typically using one or more (perhaps there are others) of the approaches I mentioned.