Hi all, I have a basic doubt regarding the creation of weighted correlation network in WGCNA-the first step. Iam doing it for the first time and want to do for my proteomics data which is preprocessed. I have spectral counts and normalised spectral abundance factors for the genes(proteins with gene ids) for the control and mutant with three replicates each. From what i do understood, first i need to transpose my data so genes are in the columns. second i need to calculate pearson correlation for the genes , than do soft threshholding and calculate mean connectivity. It would be really helpful to let me know if my approach is right or detailed r script for the creation of weighted correlation network. One more imporatnt doubt is how to select the database for the tomato for coexpression data analysis. For my data i have done the analysis with itag4.1. Thank you in advance.
In total you have only 6 samples. That's not enough for WGCNA
Thank you for your reply. I actually have three stages with mutant and control with three replicates each. In total 18 samples. But the problem is there are few proteins common between three stages. So when i merge three stages there would be many missing values. so i just assumed it is better to go stage wise. Any suggestions please.
These are differentiaaly expressed proteins, significant p values with around 300-400 proteins for each stage.
You should include all proteins. Can you do that?
I can include all proteins, but still there will be many missing values.
Do i need to include where the proteins do not have all three replicate values. i.e, one or the other replicate value is missing, also.
Usually, I filter out genes not expressed in at least 80% of the samples.
If you end up with very few proteins to build a meaningful coexpression network then, don't use WGCNA
Thank you for your reply. Even i used semitools, for all the stages together, which is not able to detect soft threshhold beta value. Are there any other approaches where i can get coexprssion network.
WGCNA and cemitool calculate the soft threshold starting from a correlation matrix. You are not getting a good soft threshold because I suspect that the three stages you have here are hardly comparables because the few features (proteins) they share do not correlates.