Question: Pick soft threshold for co-expression analysis and filtering step
0
gravatar for Biologist
14 months ago by
Biologist190
Biologist190 wrote:

I have 19803 genes and 201 samples with raw counts data. I have used DEseq2 package to get normalised counts which is given as input to WGCNA.

Before that I have using filtering step in deseq2 package, with which I kept genes only with more than 20 counts in total for each gene. With this I have 18000 genes for WGCNA.

Scale free topology fit index gave a plot like below:

enter image description here

1) From that do I need to take softPower 6 or 7?

2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]

ADD COMMENTlink modified 14 months ago by Kevin Blighe56k • written 14 months ago by Biologist190
1
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

1) From that do I need to take softPower 6 or 7?

Take 6. In my former supervisor's words: "generally, the first past 0.9" - she teaches WGCNA and works in the lab where the developer used to be based.

2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]

Indeed, reducing variables based on low variance is another option; however, the genes of low variance may actually be of interest to a network analysis. Why not just continue with the 18000, provided that there is no computational / infrastructural issue in doing this.

You may also want to try the analysis with the log-transformed normalised counts, by the way.

Kevin

ADD COMMENTlink modified 14 months ago • written 14 months ago by Kevin Blighe56k

Thanks a lot Kevin. Sure I will also try with log transformed normalised counts. I see in Deseq2 tutorial there are two types rlog and vst for log transformed values. Which one should I use? deseq2 tutorial

ADD REPLYlink written 14 months ago by Biologist190
1

Hey, there is no real preference. I would start with vst and then also try with rlog after. Hopefully, results will be similar.

ADD REPLYlink written 14 months ago by Kevin Blighe56k
1

Depends also on the number of samples. If you have many (like > 50) rlog might take several hours to complete because it fits a shrinkage term for every sample which vst doesn't.

ADD REPLYlink modified 14 months ago • written 14 months ago by ATpoint32k

I have tried in both ways with normalised counts and also vst log transformed normalized values. In both the ways I had set minimum module size 50.

With normalized counts - I see 56 modules. with log transformed vales - I see 14 modules.

With log transformed values data, I took the soft power = 3, based on following plot where Rsquare is > 0.8

enter image description here

Among all the modules I'm interested in the module where my interested gene is. I see in both the ways the module where my interested gene belongs have similar number of genes.

But which way is better do you think? only normalized or log transformed normalized?

ADD REPLYlink modified 14 months ago • written 14 months ago by Biologist190
1

Well, this is one of the issues with network analysis approaches. Although the developer of WGCNA implies in one moment that your input data is not critical so long as it is normalised and that all samples are simply processed in the same way, in practice, results are highly variable, and I frequently see people banging their head against a brick wall trying to interpret the results from WGCNA. This is why I specifically never use WGCNA anymore (unless instructed).

In all honesty, I cannot answer your question. To make it easier, I would suggest using vst counts and taking the 14 modules. This is the same recommendation in the contradicting FAQ:

As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data. ... We then recommend a variance-stabilizing transformation.

...

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

[source: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html]

ADD REPLYlink written 14 months ago by Kevin Blighe56k

Thank you very much for the link.

ADD REPLYlink written 14 months ago by Biologist190

In your figure, which is first above 0.9, though? It looks like it may be 6 or 7

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe56k

@Kevin Blighe In the second figure in my comments, I see 3 is above R square 0.8. So, I took softPower 3 when using log transformed values for WGCNA.

Do you think this is right? Or should I use every time 6 or 7 as softPower?

In most of the tutorials I see they are using 7 or 8.

See in this tutorial I see 5 is above 0.8 and they took 5 as softpower [https://github.com/hms-dbmi/scw/blob/master/scw2016/tutorials/wgcna/WGCNA.md]

In this they took 8 as softPower [https://hms-dbmi.github.io/scw/WGCNA.html]

In this 7 as softPower [http://pklab.med.harvard.edu/scw2014/WGCNA.html]

ADD REPLYlink modified 14 months ago • written 14 months ago by Biologist190
1

You should generally choose the first soft power that passes 0.9 (not 0.8). This is usually 6 or 7 in most datasets.

ADD REPLYlink written 14 months ago by Kevin Blighe56k
1

Regarding 0.9, in the tutorial at least, it is drawn at 0.9: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-man.pdf

ADD REPLYlink written 14 months ago by Kevin Blighe56k
1

thanks a lot for the link Kevin

ADD REPLYlink written 14 months ago by Biologist190

sure. thank you for the quick reply.

ADD REPLYlink written 14 months ago by Biologist190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 993 users visited in the last hour