Clustering and dynamic tree cutting
0
1
Entering edit mode
22 months ago
harish ▴ 10

I am trying to cut the dendrogram tree using the package dynamicTreeCut, I prefer dynamic cutting and clustering. I run the code below

clusDyn <- cutreeDynamic(hr, distM = as.matrix(as.dist(1-cor(t(scaledata)))), method = "hybrid")

However, it produces 160 clusters, which is too many to analyze each one of them individually. Is it possible to tell to cut tree dynamically but also to group them in such a way that it produces only a specific number of clusters? For example, I would like 20 clusters after the dynamic tree cut instead of 160 clusters.

I know that if I cut the dendrogram at a specific height then I could possibly decide the number of clusters it would generate but I prefer Dynamic tree cutting.

dynamicTreeCut Clustering data RNAseq cutreeDynamic • 2.3k views
ADD COMMENT
0
Entering edit mode

it produces 160 clusters.

This is happening because the input is a simple correlation matrix that is affected by spurious or missing connections (see this paper).

ADD REPLY
0
Entering edit mode

I am very new to RNAseq analysis and clustering. Can you please elaborate on it, do you mean to say that Pearson correlation is not enough for this clustering and I should look for other methods? Is WGCNA a better workflow?

ADD REPLY
0
Entering edit mode

Help me to understand. Is this a clustering analysis of differentially expressed genes or an unsupervised clustering analysis (eg WGCNA)?

ADD REPLY
0
Entering edit mode

These are differentially expressed genes, which are around 15K genes from a total of 30 K genes. Then I follow the clustering protocol as given in this link (the genes are scaled and then clustered by Pearson correlation)- https://2-bitbio.com/2017/04/clustering-rnaseq-data-making-heatmaps.html

ADD REPLY
0
Entering edit mode

I don't think the cutreeDynamic function will work very well with a distance matrix calculated from pearson correlation values: as.matrix(as.dist(1-cor(t(scaledata)))). Just to be sure, how did you calculate hr (the link doesn't work for me)?

ADD REPLY
0
Entering edit mode

thank you for the effort, I did calculate the hr as you have shown. hr <- hclust(as.dist(1-cor(t(scaledata), method="pearson")), method="complete")

As it seems that Pearson correlation values do not work well with cutreeDynamic, can you please suggest something that I can look into, to make a better correlation matrix?

ADD REPLY
0
Entering edit mode

can you please suggest something that I can look into, to make a better correlation matrix?

Look, I am not familiar with workflows used for the detection of clusters of differentially expressed genes. What I can tell you is that cutreeDynamic, with the default settings, doesn't work very well when the distance matrix is calculated just from pearson correlation values.

If you want to use cutreeDynamic, there are settings that you can change in oder to reduce number of clusters. For example, see: minClusterSize, deepSplit, cutHeight, and maxCoreScatter (usage)

ADD REPLY
0
Entering edit mode

Hi #andres.firrincieli,

Although it's late, hope to have your helpful answer. regarding cutreeDynamic, you recommended changing some settings like cutHeight. so, we have to determine cutHeight value even with cutreeDynamic. my understanding was we do not need to specify the cutHeight parameter explicitly for cutreeDynamic, it is not correct, right?

ADD REPLY
1
Entering edit mode

I typically set the minimum cluster size to 100 and leave the others with the default settings.

ADD REPLY

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6