Question

Tools for finding gene clusters in RNA-seq differential expression data?

1

Entering edit mode

6.4 years ago

am ▴ 10

I have a list of deferentially expressed genes and their gene counts obtained via RSEM/EBSeq. My raw data consists of RNA-seq reads from disease-free and relapse patients. When I make a heatmap, I can see that there are clusters of over or under expressed genes within each condition (relapse vs. disease-free). What tools exist to identify these clusters of genes? I believe Bioconductor's ConsensusClusterPlus is the only tool I'm familiar with. What other tools exist?

Note: I'm not seeking gene enrichment or over-representation tools, e.g. TopGo, ConcensusPathDB, DAVID, PANTHER. . . .

RNA-Seq R rsem ebseq cluster • 5.2k views

ADD COMMENT • link updated 6.4 years ago by Farbod ★ 3.4k • written 6.4 years ago by am ▴ 10

2

Entering edit mode

If you have large number of samples ( more than 15 samples per condition ) you can use WGCAN as mentioned by others to find modules of genes. But, if you dont have large samples, I would suggest to use simple clustering methods like hierarchical or k-means clustering. You can use different methods like elbow, or gap statistics to identify the possible number of clusters and use that information to create n-number of clusters by k-means.

ADD REPLY • link 6.4 years ago by GouthamAtla 12k

0

Entering edit mode

Hi and thanks,

I have 3 replicates for condition A and 3 for condition B.

ADD REPLY • link 6.4 years ago by Farbod ★ 3.4k

score 2 · Answer 1 · 2017-11-24

2

Entering edit mode

6.4 years ago

biofalconch ★ 1.1k

WGCNA might do the trick for ya: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/

If you want to read about the basics, this paper might be useful: https://www.nature.com/articles/nbt1205-1499.pdf

Cheerio

ADD COMMENT • link 6.4 years ago by biofalconch ★ 1.1k

1

Entering edit mode

This was a useful comment. Gets my up vote! Possibly could have been an answer.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Dear @biofalconch Hi,

I have done de novo assembly (Trinity) and DEG analysis using edgeR.

1- Can I use WGCNA for clustering or network analysis of my differentially expressed transcripts/genes?

2- which files of Trinity or edgeR result I can use as input of WGCNA? should I create a special file containing both my Condition-1 and Condition-2 DEGs information in it?

Thank you in advance

ADD REPLY • link 6.4 years ago by Farbod ★ 3.4k

score 2 · Answer 2 · 2017-11-24

Just off the top of my head:

cutree in R
PAM clustering with various metrics to identify ideal cluster groups (my own work)
ConsensusClusterPlus (as you mentioned)
WGCNA (as mentioned by our colleague, biofalconch
Community structure identification via networks (Network plot from expression data in R using igraph)

Cluster identification is an interesting area and there exists no consistent and standardised way to do it. To be honest, simply generating a dendrogram and cutting the tree at a certain chosen height with cutree can be one of the most effective ways to do it, but this is obviously then biased because it's the human brain that's choosing the clusters indirectly via a height metric. That said, I cannot see your dendrogram and don't know how different these clusters you mention are.

Also be aware that your distance and linkage methods will ultimately affect how clusters are chosen in pretty much all of the methods that I list above. Not many people realise that and thus the default of Euclidean distance with average linkage is usually chosen, even thought these may not be suitable to all data types.

Kevin

score 1 · Answer 3 · 2017-11-25

1

Entering edit mode

6.4 years ago

Farbod ★ 3.4k

Dear @am, Hi and welcome to Biostars.

Please have a look at "Automatically Partitioning Genes into Expression Clusters", in Trinity website. HTH.

~Best

enter image description here

ADD COMMENT • link 6.4 years ago by Farbod ★ 3.4k