problem for constructing Topological Overlap Matrix (TOM) in WGCNA Algorithm
1
0
Entering edit mode
5.8 years ago
modarzi ▴ 170

Hi,

I run WGCNA for my study.Now, I want to import my network in Cytoscape for visualization. based on WGCNA tutorial, for that purpose I have to run below code:

# select modules modules = c("blue","brown") 
# Select module probes 
inModule=is.finite(match(moduleColorsFemale,modules)) 
modProbes=probes[inModule] 
match1=match(modProbes,GeneAnnotation$substanceBXH) 
modGenes=GeneAnnotation$gene_symbol[match1] 
# Select the corresponding Topological Overlap 
 modTOM = TOM[inModule, inModule] 
dimnames(modTOM) = list(modProbes, modProbes) 
# Export the network into edge and node list files for Cytoscape 
cyt = exportNetworkToCytoscape(modTOM, 
edgeFile=paste("CytoEdge",paste(modules,collapse="-"),".txt",sep=""), nodeFile=paste("CytoNode",paste(modules,collapse="-"),".txt",sep=""), 
weighted = TRUE, threshold = 0.02,nodeNames=modProbes, 
altNodeNames = modGenes, nodeAttr = moduleColorsFemale[inModule])

when I want to run:

modTOM = TOM[inModule, inModule]

I got below error:

Error: object 'TOM' not found.

So, my question is what is TOM.should I calculate TOM via below code:

> TOM = TOMsimilarityFromExpr(datExpr, power=7)

I appreciate if anybody share his/her comment with me.

Best Regards,

Mohammad

RNA-Seq WGCNA Topological overlap Matrix • 5.6k views
ADD COMMENT
0
Entering edit mode

For Your Information (FYI): double-posted at Bioconductor: https://support.bioconductor.org/p/110777/

ADD REPLY
1
Entering edit mode
5.8 years ago

Dear Mohammad,

A TOM is a toplogical overlap matrix, which can be created from the adjacency matrix of your express matrix:

softPower <- 6 ;
adjacency <- adjacency(datExpr, power = softPower) ;
TOM <- TOMsimilarity(adjacency) ;

Please refer to page 3 of the WGCNA tutorial 2.b Step-by-step network construction and module detection for the finer details.

Kevin

ADD COMMENT
0
Entering edit mode

Dear Dr. Blighe

thanks for your comment. But for limitation in my hardware resource, I selected 'Automatic, one-step network construction and module detection' as an feasible option. But for importing my constructed network to Cytoscape, I have to use TOM. also for this purpose, I ran below code:

TOM = TOMsimilarityFromExpr(datExprSTLMS, power=3)

and I got below error:

Error: cannot allocate vector of size 24.2 Gb

so, based on my limitation,could you please recommend another solution without using TOM?

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

You are trying to create a network from the entire data-matrix, right? What are the dimensions of datExprSTLMS?

Usually, we filter the data before we generate the network.

ADD REPLY
0
Entering edit mode

yes, I am trying to create a network from the entire data-matrix. dimension of my datExprSTLMS is 53 * 56900.

if my process has problem based on which logic restriction I can filter my data set?

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

That is very large and you will likly hav isues with RAM / memory.

Take a look here: WGCNA maxBlockSize limit

ADD REPLY
0
Entering edit mode

Dear Dr. Blighe

Thanks for your comment. based on your guide I have to filter some rows (gene expression data) based on variance cut-offs. So, If I think true about this filter, how can I determine this cutoffs?

I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

Yes, using a filter based on variance would be a good idea.

ADD REPLY
1
Entering edit mode

What is the goal of yor research? Why do you wan to create the network?

ADD REPLY
0
Entering edit mode

Dear Dr. Bleghe

I want to use network approach for Identifying hub genes and pathways in 2 cancer. My data set was downloaded from TCGA and as you know typically all data sets in TCGA have more than 60000 gene type. So, If you want to construct network via these data set you face by high dimension of data. therefor for generating TOM I have problem as i said before.

I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

Hello again. My simple comment is that you should do some pre-filtering such that you can actually generate the networks. Removing genes based on low variance is a reasonable idea. Also, you could perform a differential expression analysis between tumour and normal samples, and then only build the network from genes that are statistically differentially expressed between these.

ADD REPLY
0
Entering edit mode

Thanks for your comment. I have to remove genes based on low variance. So, for that purpose I need good function. during search in Internet I found 'genefilter' package in R. But, this package is useful when your data set is micro Array based on that vignette.

Could you pleased suggest good solution for variance base filtering in R?

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

Oh, you should look into var() function in combination with apply().

For example:

apply(MyData, 1, var)

That will get the variance of each row

filter <- which(apply(MyData, 1, var) > 0.2)

That will find all genes with variance greater than 0.2 and store TRUE / FALSE in filter. You could then filter your data with:

MyData.filt <- MyData[filter,]
ADD REPLY
0
Entering edit mode

Thanks for your comment. I filter MyData based on your suggestion code:

filter <- which(apply(MyData, 1, var) > 0.2)

But size of rows(genes) decrease from 56963 to 50580 and also it is high(50580). So, My question is how can I find best threshold for filtering? now this threshold is 0.2. does it have any theoretical base or just have experimental base? I know you have expert in WGCNA, So based on your experience for my story which threshold is good?

I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

Hello again! I just picked 0.2 'randomly'. What could could do is find the range of variance via the min() and max() (or range()) functions, and then set the cut-off to half the range.

For example:

variances <- apply(MyData, 1, var)
min <- min(variances, na.rm=TRUE)
max <- max(variances, na.rm=TRUE)

cutoff <- min + ((max-min) / 2)

filter <- which(apply(MyData, 1, var) > cutoff)
ADD REPLY
0
Entering edit mode

Dear Dr. Blighe

Thanks for your comment. based on your good recommendation cutoff, MyData has decreased to 27251 genes. So, could you please give me reference for below formula:

min + ((max-min) / 2)

Best Regards,

Mohammad

ADD REPLY
1
Entering edit mode

Hello Sir. There is no reference - it is just a method to remove genes of low variance. It is likely used in many thousands of publications. You can phrase it like this: 'Prior to network construction, we removed X genes whose variance across all samples fell into the lower half of the variance range'.

This method is, for example, used in DESeq2's plotPCA function

ADD REPLY
0
Entering edit mode

Dear Dr. Blighe

Thanks for your comment. based on your guide I have decreased my data dimension from 56000 to 27000. but in 56000 dimension, I had 95 modules and now I have 141 modules. is this situation normal? I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLY

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6