hclust with similar data gives different
0
0
Entering edit mode
7 months ago

I have RNAseq data with expression in ensembl ID and I convert them into gene symbol and for further analysis. I had performed hclustering and then cut tree using dynamicTreeCut with a geneset of 20010 genes and got 27 different gene-clusters. Now after a few months, I have a geneset consisting of 20149 genes (there are 139 genes compared to the earlier geneset, which is because more ensemble id have received HGNC symbol in the meantime). But when performing hclust and dynamicTreeCut now with the new geneset I get totally different gene-clustering, though there are only 139 new genes. The problem is that the clustering is so drastically different that GO enrichment looks totally different. I don't understand how this drastic difference is even possible, any insights would be helpful. This also makes me think, will my code work for others if they prefer to work with the raw data later on?

mydata_filter <- mydata[rowSums(mydata >1) >=1,]
scaledata <- t(scale(t(mydata_filter_matrix)))
scaledata <- scaledata[complete.cases(scaledata),]

hr <- hclust(as.dist(1-cor(t(scaledata), method="pearson")), method="complete")

cutree_1.5 <- cutreeDynamic(hr, distM = as.matrix(as.dist(1-cor(t(scaledata)))), method = "hybrid", minAbsSplitHeight = 1.5)

I have run the code simultaneously for both dataset, but still don't understand the reason.

cutreeDynamic Clustering hclust dynamicTreeCut • 380 views
ADD COMMENT
0
Entering edit mode

I have a speculation that it is because I use DESeq normalized counts. I think DESeq normalized counts value for genes changes slightly based on the total number of genes analyzed on DESeq2, and this is responsible for different patterns of clustering. Is there a way to control this or can I start with some other dataset rather than DESeq2 normalized counts?

ADD REPLY

Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6