Question: Tools for finding gene clusters in RNA-seq differential expression data?
0
gravatar for am
2.1 years ago by
am0
am0 wrote:

I have a list of deferentially expressed genes and their gene counts obtained via RSEM/EBSeq. My raw data consists of RNA-seq reads from disease-free and relapse patients. When I make a heatmap, I can see that there are clusters of over or under expressed genes within each condition (relapse vs. disease-free). What tools exist to identify these clusters of genes? I believe Bioconductor's ConsensusClusterPlus is the only tool I'm familiar with. What other tools exist?

Note: I'm not seeking gene enrichment or over-representation tools, e.g. TopGo, ConcensusPathDB, DAVID, PANTHER. . . .

rsem rna-seq ebseq R cluster • 2.8k views
ADD COMMENTlink modified 2.1 years ago by Farbod3.3k • written 2.1 years ago by am0
2

If you have large number of samples ( more than 15 samples per condition ) you can use WGCAN as mentioned by others to find modules of genes. But, if you dont have large samples, I would suggest to use simple clustering methods like hierarchical or k-means clustering. You can use different methods like elbow, or gap statistics to identify the possible number of clusters and use that information to create n-number of clusters by k-means.

ADD REPLYlink written 2.1 years ago by geek_y10k

Hi and thanks,

I have 3 replicates for condition A and 3 for condition B.

ADD REPLYlink written 2.1 years ago by Farbod3.3k
2
gravatar for biofalconch
2.1 years ago by
biofalconch410
Mexico
biofalconch410 wrote:

WGCNA might do the trick for ya: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/

If you want to read about the basics, this paper might be useful: https://www.nature.com/articles/nbt1205-1499.pdf

Cheerio

ADD COMMENTlink written 2.1 years ago by biofalconch410
1

This was a useful comment. Gets my up vote! Possibly could have been an answer.

ADD REPLYlink written 2.1 years ago by Kevin Blighe53k

Dear @biofalconch Hi,

I have done de novo assembly (Trinity) and DEG analysis using edgeR.

1- Can I use WGCNA for clustering or network analysis of my differentially expressed transcripts/genes?

2- which files of Trinity or edgeR result I can use as input of WGCNA? should I create a special file containing both my Condition-1 and Condition-2 DEGs information in it?

Thank you in advance

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Farbod3.3k
2
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe53k
Kevin Blighe53k wrote:

Just off the top of my head:

Cluster identification is an interesting area and there exists no consistent and standardised way to do it. To be honest, simply generating a dendrogram and cutting the tree at a certain chosen height with cutree can be one of the most effective ways to do it, but this is obviously then biased because it's the human brain that's choosing the clusters indirectly via a height metric. That said, I cannot see your dendrogram and don't know how different these clusters you mention are.

Also be aware that your distance and linkage methods will ultimately affect how clusters are chosen in pretty much all of the methods that I list above. Not many people realise that and thus the default of Euclidean distance with average linkage is usually chosen, even thought these may not be suitable to all data types.

Kevin

ADD COMMENTlink written 2.1 years ago by Kevin Blighe53k
1
gravatar for Farbod
2.1 years ago by
Farbod3.3k
Toronto
Farbod3.3k wrote:

Dear @am, Hi and welcome to Biostars.

Please have a look at "Automatically Partitioning Genes into Expression Clusters", in Trinity website. HTH.

~Best

enter image description here

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Farbod3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 839 users visited in the last hour