Question: problem for constructing Topological Overlap Matrix (TOM) in WGCNA Algorithm
0
7 months ago by
modarzi60
modarzi60 wrote:

Hi,

I run WGCNA for my study.Now, I want to import my network in Cytoscape for visualization. based on WGCNA tutorial, for that purpose I have to run below code:

``````# select modules modules = c("blue","brown")
# Select module probes
inModule=is.finite(match(moduleColorsFemale,modules))
modProbes=probes[inModule]
match1=match(modProbes,GeneAnnotation\$substanceBXH)
modGenes=GeneAnnotation\$gene_symbol[match1]
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule]
dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into edge and node list files for Cytoscape
cyt = exportNetworkToCytoscape(modTOM,
edgeFile=paste("CytoEdge",paste(modules,collapse="-"),".txt",sep=""), nodeFile=paste("CytoNode",paste(modules,collapse="-"),".txt",sep=""),
weighted = TRUE, threshold = 0.02,nodeNames=modProbes,
altNodeNames = modGenes, nodeAttr = moduleColorsFemale[inModule])
``````

when I want to run:

``````modTOM = TOM[inModule, inModule]
``````

I got below error:

``````Error: object 'TOM' not found.
``````

So, my question is what is TOM.should I calculate TOM via below code:

``````> TOM = TOMsimilarityFromExpr(datExpr, power=7)
``````

I appreciate if anybody share his/her comment with me.

Best Regards,

modified 7 months ago by WouterDeCoster36k • written 7 months ago by modarzi60

For Your Information (FYI): double-posted at Bioconductor: https://support.bioconductor.org/p/110777/

1
7 months ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

A TOM is a toplogical overlap matrix, which can be created from the adjacency matrix of your express matrix:

``````softPower <- 6 ;
``````

Please refer to page 3 of the WGCNA tutorial 2.b Step-by-step network construction and module detection for the finer details.

Kevin

Dear Dr. Blighe

thanks for your comment. But for limitation in my hardware resource, I selected 'Automatic, one-step network construction and module detection' as an feasible option. But for importing my constructed network to Cytoscape, I have to use TOM. also for this purpose, I ran below code:

``````TOM = TOMsimilarityFromExpr(datExprSTLMS, power=3)
``````

and I got below error:

``````Error: cannot allocate vector of size 24.2 Gb
``````

so, based on my limitation,could you please recommend another solution without using TOM?

Best Regards,

1

You are trying to create a network from the entire data-matrix, right? What are the dimensions of datExprSTLMS?

Usually, we filter the data before we generate the network.

yes, I am trying to create a network from the entire data-matrix. dimension of my datExprSTLMS is 53 * 56900.

if my process has problem based on which logic restriction I can filter my data set?

Best Regards,

1

That is very large and you will likly hav isues with RAM / memory.

Take a look here: WGCNA maxBlockSize limit

Dear Dr. Blighe

Thanks for your comment. based on your guide I have to filter some rows (gene expression data) based on variance cut-offs. So, If I think true about this filter, how can I determine this cutoffs?

I appreciate if you share your comment with me.

Best Regards,

1

Yes, using a filter based on variance would be a good idea.

1

What is the goal of yor research? Why do you wan to create the network?

Dear Dr. Bleghe

I want to use network approach for Identifying hub genes and pathways in 2 cancer. My data set was downloaded from TCGA and as you know typically all data sets in TCGA have more than 60000 gene type. So, If you want to construct network via these data set you face by high dimension of data. therefor for generating TOM I have problem as i said before.

I appreciate if you share your comment with me.

Best Regards,

1

Hello again. My simple comment is that you should do some pre-filtering such that you can actually generate the networks. Removing genes based on low variance is a reasonable idea. Also, you could perform a differential expression analysis between tumour and normal samples, and then only build the network from genes that are statistically differentially expressed between these.

Thanks for your comment. I have to remove genes based on low variance. So, for that purpose I need good function. during search in Internet I found 'genefilter' package in R. But, this package is useful when your data set is micro Array based on that vignette.

Could you pleased suggest good solution for variance base filtering in R?

Best Regards,

1

Oh, you should look into `var()` function in combination with `apply()`.

For example:

``````apply(MyData, 1, var)
``````

That will get the variance of each row

``````filter <- which(apply(MyData, 1, var) > 0.2)
``````

That will find all genes with variance greater than 0.2 and store TRUE / FALSE in filter. You could then filter your data with:

``````MyData.filt <- MyData[filter,]
``````

Thanks for your comment. I filter MyData based on your suggestion code:

``````filter <- which(apply(MyData, 1, var) > 0.2)
``````

But size of rows(genes) decrease from 56963 to 50580 and also it is high(50580). So, My question is how can I find best threshold for filtering? now this threshold is 0.2. does it have any theoretical base or just have experimental base? I know you have expert in WGCNA, So based on your experience for my story which threshold is good?

I appreciate if you share your comment with me.

Best Regards,

1

Hello again! I just picked 0.2 'randomly'. What could could do is find the range of variance via the min() and max() (or range()) functions, and then set the cut-off to half the range.

For example:

``````variances <- apply(MyData, 1, var)
min <- min(variances, na.rm=TRUE)
max <- max(variances, na.rm=TRUE)

cutoff <- min + ((max-min) / 2)

filter <- which(apply(MyData, 1, var) > cutoff)
``````

Dear Dr. Blighe

Thanks for your comment. based on your good recommendation cutoff, MyData has decreased to 27251 genes. So, could you please give me reference for below formula:

``````min + ((max-min) / 2)
``````

Best Regards,

1

Hello Sir. There is no reference - it is just a method to remove genes of low variance. It is likely used in many thousands of publications. You can phrase it like this: 'Prior to network construction, we removed X genes whose variance across all samples fell into the lower half of the variance range'.

This method is, for example, used in DESeq2's plotPCA function

Dear Dr. Blighe

Thanks for your comment. based on your guide I have decreased my data dimension from 56000 to 27000. but in 56000 dimension, I had 95 modules and now I have 141 modules. is this situation normal? I appreciate if you share your comment with me.

Best Regards,