I have a file of differentially expressed genes in a csv file. The data is already normalized as TPM. I have two treatments, with 12 samples. All up-regulated and down-regulated genes are included because I want to find reverse-correlated genes as well. Currently, the data is formatted in this manner:
Name Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7...
gene1 5.6 12.0 0.0 0.5 0.8 0.6 0.0 0.0
gene 2 1.4 0.0 0.0 0.0 0.0 0.0 0.3 0.0
gene3 52.5 58.9 1.5 3.5 1.9 2.4 2.1 1.5
gene4 11.1 0.0 0.0 0.1 0.0 0.0 0.3 0.1
gene5 6.1 39.8 3.5 6.5 4.7 0.8 0.8 0.8
gene6 36.9 40.0 2.9 2.0 1.6 9.1 5.2 2.0
gene7 107.5 321.3 1.0 0.4 1.7 0.8 0.6 0.3
However, I have 4,300 genes. Most are mRNA, but some are long non-coding RNA. I am trying to use the cytoscape plugin ExpressionCorrelation to create a network and split the genes into clusters based on their correlation, but I'm not sure it's working correctly. Once I'm done, I'd take the different clusters and do functional annotation, and take the nodes with high levels of centrality and blast against transcription factor database.
After making gene network(preview histogram) I chose -0.8 and 0.8 as cutoffs. Then under tools I click on analyze network. Then, I use subnetwork creation (analyze connected components). When I do this, it splits into 8 subnetworks, but 1 of them contains 4,290 nodes! So it didn't do a good job of creating subnetworks.
Is there something I'm doing wrong? I haven't added any phenotypic data or anything, it's analyzing all samples together. Ideally I'd like to add layers for visualization like different color for lncRNA and mRNA but it's my understanding that this is not needed yet, since it should create an unbiased network.
I can provide additional info if it would help, please let me know!
What is your objective ?
Are you trying to make clusters from expression data?
There are plugins like MCODE , ClusterViz, Clutermaker available in cytoscape to make the clusters.
Otherwise you can apply
K-means algorithm
to create desired clustersThank you!
Yes, I’m trying to cluster the genes by expression to try to find coexpressed or “related” genes. I’m basically trying to filter them into manageable groups, because trying to do target prediction now is impossible with this many genes.
If I want to compare expression data and then investigate colocalized genes (lncRNA and mRNA that are on the same chromosome and 1KB apart) what would be the best way to do that?
I would suggest you can go for
WGCNA
to create the modules based on co-expression. You can explore it hereThen, perform an Gene Ontology for the clusters obtained from WGCNA.
Consider the
GO terms
that fits for your disease or conditions of interest.Perform the downstream analysis on the genes filtered by
GO terms
Note
:Upvoting the answer is recommended if it has helped you
I have tried WGCNA, however I don't have any phenotypic data to associate with the samples, other than treated vs untreated.