Question: Tools for finding gene clusters in RNA-seq differential expression data?
gravatar for am
17 months ago by
am0 wrote:

I have a list of deferentially expressed genes and their gene counts obtained via RSEM/EBSeq. My raw data consists of RNA-seq reads from disease-free and relapse patients. When I make a heatmap, I can see that there are clusters of over or under expressed genes within each condition (relapse vs. disease-free). What tools exist to identify these clusters of genes? I believe Bioconductor's ConsensusClusterPlus is the only tool I'm familiar with. What other tools exist?

Note: I'm not seeking gene enrichment or over-representation tools, e.g. TopGo, ConcensusPathDB, DAVID, PANTHER. . . .

rsem rna-seq ebseq R cluster • 2.0k views
ADD COMMENTlink modified 17 months ago by Farbod3.2k • written 17 months ago by am0

WGCNA might do the trick for ya:

If you want to read about the basics, this paper might be useful:


ADD REPLYlink written 17 months ago by biofalconch390

This was a useful comment. Gets my up vote! Possibly could have been an answer.

ADD REPLYlink written 17 months ago by Kevin Blighe41k

Dear @biofalconch Hi,

I have done de novo assembly (Trinity) and DEG analysis using edgeR.

1- Can I use WGCNA for clustering or network analysis of my differentially expressed transcripts/genes?

2- which files of Trinity or edgeR result I can use as input of WGCNA? should I create a special file containing both my Condition-1 and Condition-2 DEGs information in it?

Thank you in advance

ADD REPLYlink modified 17 months ago • written 17 months ago by Farbod3.2k

If you have large number of samples ( more than 15 samples per condition ) you can use WGCAN as mentioned by others to find modules of genes. But, if you dont have large samples, I would suggest to use simple clustering methods like hierarchical or k-means clustering. You can use different methods like elbow, or gap statistics to identify the possible number of clusters and use that information to create n-number of clusters by k-means.

ADD REPLYlink written 17 months ago by geek_y9.4k

Hi and thanks,

I have 3 replicates for condition A and 3 for condition B.

ADD REPLYlink written 17 months ago by Farbod3.2k
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

Just off the top of my head:

Cluster identification is an interesting area and there exists no consistent and standardised way to do it. To be honest, simply generating a dendrogram and cutting the tree at a certain chosen height with cutree can be one of the most effective ways to do it, but this is obviously then biased because it's the human brain that's choosing the clusters indirectly via a height metric. That said, I cannot see your dendrogram and don't know how different these clusters you mention are.

Also be aware that your distance and linkage methods will ultimately affect how clusters are chosen in pretty much all of the methods that I list above. Not many people realise that and thus the default of Euclidean distance with average linkage is usually chosen, even thought these may not be suitable to all data types.


ADD COMMENTlink written 17 months ago by Kevin Blighe41k
gravatar for Farbod
17 months ago by
Farbod3.2k wrote:

Dear @am, Hi and welcome to Biostars.

Please have a look at "Automatically Partitioning Genes into Expression Clusters", in Trinity website. HTH.


enter image description here

ADD COMMENTlink modified 17 months ago • written 17 months ago by Farbod3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 921 users visited in the last hour