Question: problem for constructing Topological Overlap Matrix (TOM) in WGCNA Algorithm
0
gravatar for modarzi
11 days ago by
modarzi20
modarzi20 wrote:

Hi,

I run WGCNA for my study.Now, I want to import my network in Cytoscape for visualization. based on WGCNA tutorial, for that purpose I have to run below code:

# select modules modules = c("blue","brown") 
# Select module probes 
inModule=is.finite(match(moduleColorsFemale,modules)) 
modProbes=probes[inModule] 
match1=match(modProbes,GeneAnnotation$substanceBXH) 
modGenes=GeneAnnotation$gene_symbol[match1] 
# Select the corresponding Topological Overlap 
 modTOM = TOM[inModule, inModule] 
dimnames(modTOM) = list(modProbes, modProbes) 
# Export the network into edge and node list files for Cytoscape 
cyt = exportNetworkToCytoscape(modTOM, 
edgeFile=paste("CytoEdge",paste(modules,collapse="-"),".txt",sep=""), nodeFile=paste("CytoNode",paste(modules,collapse="-"),".txt",sep=""), 
weighted = TRUE, threshold = 0.02,nodeNames=modProbes, 
altNodeNames = modGenes, nodeAttr = moduleColorsFemale[inModule])

when I want to run:

modTOM = TOM[inModule, inModule]

I got below error:

Error: object 'TOM' not found.

So, my question is what is TOM.should I calculate TOM via below code:

> TOM = TOMsimilarityFromExpr(datExpr, power=7)

I appreciate if anybody share his/her comment with me.

Best Regards,

Mohammad

ADD COMMENTlink modified 9 days ago by WouterDeCoster30k • written 11 days ago by modarzi20

For Your Information (FYI): double-posted at Bioconductor: https://support.bioconductor.org/p/110777/

ADD REPLYlink written 10 days ago by Kevin Blighe24k
1
gravatar for Kevin Blighe
10 days ago by
Kevin Blighe24k
Republic of Ireland
Kevin Blighe24k wrote:

Dear Mohammad,

A TOM is a toplogical overlap matrix, which can be created from the adjacency matrix of your express matrix:

softPower <- 6 ;
adjacency <- adjacency(datExpr, power = softPower) ;
TOM <- TOMsimilarity(adjacency) ;

Please refer to page 3 of the WGCNA tutorial 2.b Step-by-step network construction and module detection for the finer details.

Kevin

ADD COMMENTlink modified 10 days ago • written 10 days ago by Kevin Blighe24k

Dear Dr. Blighe

thanks for your comment. But for limitation in my hardware resource, I selected 'Automatic, one-step network construction and module detection' as an feasible option. But for importing my constructed network to Cytoscape, I have to use TOM. also for this purpose, I ran below code:

TOM = TOMsimilarityFromExpr(datExprSTLMS, power=3)

and I got below error:

Error: cannot allocate vector of size 24.2 Gb

so, based on my limitation,could you please recommend another solution without using TOM?

Best Regards,

Mohammad

ADD REPLYlink modified 10 days ago • written 10 days ago by modarzi20
1

You are trying to create a network from the entire data-matrix, right? What are the dimensions of datExprSTLMS?

Usually, we filter the data before we generate the network.

ADD REPLYlink written 10 days ago by Kevin Blighe24k

yes, I am trying to create a network from the entire data-matrix. dimension of my datExprSTLMS is 53 * 56900.

if my process has problem based on which logic restriction I can filter my data set?

Best Regards,

Mohammad

ADD REPLYlink written 9 days ago by modarzi20
1

That is very large and you will likly hav isues with RAM / memory.

Take a look here: WGCNA maxBlockSize limit

ADD REPLYlink written 9 days ago by Kevin Blighe24k

Dear Dr. Blighe

Thanks for your comment. based on your guide I have to filter some rows (gene expression data) based on variance cut-offs. So, If I think true about this filter, how can I determine this cutoffs?

I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLYlink written 9 days ago by modarzi20
1

Yes, using a filter based on variance would be a good idea.

ADD REPLYlink written 9 days ago by Kevin Blighe24k
1

What is the goal of yor research? Why do you wan to create the network?

ADD REPLYlink written 9 days ago by Kevin Blighe24k

Dear Dr. Bleghe

I want to use network approach for Identifying hub genes and pathways in 2 cancer. My data set was downloaded from TCGA and as you know typically all data sets in TCGA have more than 60000 gene type. So, If you want to construct network via these data set you face by high dimension of data. therefor for generating TOM I have problem as i said before.

I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLYlink written 8 days ago by modarzi20
1

Hello again. My simple comment is that you should do some pre-filtering such that you can actually generate the networks. Removing genes based on low variance is a reasonable idea. Also, you could perform a differential expression analysis between tumour and normal samples, and then only build the network from genes that are statistically differentially expressed between these.

ADD REPLYlink written 8 days ago by Kevin Blighe24k

Thanks for your comment. I have to remove genes based on low variance. So, for that purpose I need good function. during search in Internet I found 'genefilter' package in R. But, this package is useful when your data set is micro Array based on that vignette.

Could you pleased suggest good solution for variance base filtering in R?

Best Regards,

Mohammad

ADD REPLYlink written 8 days ago by modarzi20
1

Oh, you should look into var() function in combination with apply().

For example:

apply(MyData, 1, var)

That will get the variance of each row

filter <- which(apply(MyData, 1, var) > 0.2)

That will find all genes with variance greater than 0.2 and store TRUE / FALSE in filter. You could then filter your data with:

MyData.filt <- MyData[filter,]
ADD REPLYlink written 8 days ago by Kevin Blighe24k

Thanks for your comment. I filter MyData based on your suggestion code:

filter <- which(apply(MyData, 1, var) > 0.2)

But size of rows(genes) decrease from 56963 to 50580 and also it is high(50580). So, My question is how can I find best threshold for filtering? now this threshold is 0.2. does it have any theoretical base or just have experimental base? I know you have expert in WGCNA, So based on your experience for my story which threshold is good?

I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLYlink written 7 days ago by modarzi20
1

Hello again! I just picked 0.2 'randomly'. What could could do is find the range of variance via the min() and max() (or range()) functions, and then set the cut-off to half the range.

For example:

variances <- apply(MyData, 1, var)
min <- min(variances, na.rm=TRUE)
max <- max(variances, na.rm=TRUE)

cutoff <- min + ((max-min) / 2)

filter <- which(apply(MyData, 1, var) > cutoff)
ADD REPLYlink written 7 days ago by Kevin Blighe24k

Dear Dr. Blighe

Thanks for your comment. based on your good recommendation cutoff, MyData has decreased to 27251 genes. So, could you please give me reference for below formula:

min + ((max-min) / 2)

Best Regards,

Mohammad

ADD REPLYlink modified 7 days ago • written 7 days ago by modarzi20
1

Hello Sir. There is no reference - it is just a method to remove genes of low variance. It is likely used in many thousands of publications. You can phrase it like this: 'Prior to network construction, we removed X genes whose variance across all samples fell into the lower half of the variance range'.

This method is, for example, used in DESeq2's plotPCA function

ADD REPLYlink written 7 days ago by Kevin Blighe24k

Dear Dr. Blighe

Thanks for your comment. based on your guide I have decreased my data dimension from 56000 to 27000. but in 56000 dimension, I had 95 modules and now I have 141 modules. is this situation normal? I appreciate if you share your comment with me.

Best Regards,

Mohammad

ADD REPLYlink modified 5 days ago • written 7 days ago by modarzi20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1004 users visited in the last hour