I have 19803 genes and 201 samples with raw counts data. I have used DEseq2 package to get normalised counts which is given as input to WGCNA.
Before that I have using filtering step in deseq2 package, with which I kept genes only with more than 20 counts in total for each gene. With this I have 18000 genes for WGCNA.
Scale free topology fit index gave a plot like below:
1) From that do I need to take softPower 6 or 7?
2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]
Thanks a lot Kevin. Sure I will also try with log transformed normalised counts. I see in
Deseq2
tutorial there are two typesrlog
andvst
for log transformed values. Which one should I use? deseq2 tutorialHey, there is no real preference. I would start with vst and then also try with rlog after. Hopefully, results will be similar.
Depends also on the number of samples. If you have many (like > 50)
rlog
might take several hours to complete because it fits a shrinkage term for every sample whichvst
doesn't.I have tried in both ways with normalised counts and also vst log transformed normalized values. In both the ways I had set minimum module size 50.
With normalized counts - I see 56 modules. with log transformed vales - I see 14 modules.
With log transformed values data, I took the soft power = 3, based on following plot where Rsquare is > 0.8
Among all the modules I'm interested in the module where my interested gene is. I see in both the ways the module where my interested gene belongs have similar number of genes.
But which way is better do you think? only normalized or log transformed normalized?
Well, this is one of the issues with network analysis approaches. Although the developer of WGCNA implies in one moment that your input data is not critical so long as it is normalised and that all samples are simply processed in the same way, in practice, results are highly variable, and I frequently see people banging their head against a brick wall trying to interpret the results from WGCNA. This is why I specifically never use WGCNA anymore (unless instructed).
In all honesty, I cannot answer your question. To make it easier, I would suggest using vst counts and taking the 14 modules. This is the same recommendation in the contradicting FAQ:
[source: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html]
Thank you very much for the link.
In your figure, which is first above 0.9, though? It looks like it may be 6 or 7
@Kevin Blighe In the second figure in my comments, I see 3 is above R square 0.8. So, I took softPower 3 when using log transformed values for WGCNA.
Do you think this is right? Or should I use every time 6 or 7 as softPower?
In most of the tutorials I see they are using 7 or 8.
See in this tutorial I see 5 is above 0.8 and they took 5 as softpower [https://github.com/hms-dbmi/scw/blob/master/scw2016/tutorials/wgcna/WGCNA.md]
In this they took 8 as softPower [https://hms-dbmi.github.io/scw/WGCNA.html]
In this 7 as softPower [http://pklab.med.harvard.edu/scw2014/WGCNA.html]
You should generally choose the first soft power that passes 0.9 (not 0.8). This is usually 6 or 7 in most datasets.
Regarding 0.9, in the tutorial at least, it is drawn at 0.9: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-man.pdf
thanks a lot for the link Kevin
sure. thank you for the quick reply.