Question: Pick soft threshold for co-expression analysis and filtering step
0
gravatar for Biologist
5 months ago by
Biologist150
Biologist150 wrote:

I have 19803 genes and 201 samples with raw counts data. I have used DEseq2 package to get normalised counts which is given as input to WGCNA.

Before that I have using filtering step in deseq2 package, with which I kept genes only with more than 20 counts in total for each gene. With this I have 18000 genes for WGCNA.

Scale free topology fit index gave a plot like below:

enter image description here

1) From that do I need to take softPower 6 or 7?

2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]

ADD COMMENTlink modified 5 months ago by Kevin Blighe43k • written 5 months ago by Biologist150
1
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe43k
Republic of Ireland
Kevin Blighe43k wrote:

1) From that do I need to take softPower 6 or 7?

Take 6. In my former supervisor's words: "generally, the first past 0.9" - she teaches WGCNA and works in the lab where the developer used to be based.

2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]

Indeed, reducing variables based on low variance is another option; however, the genes of low variance may actually be of interest to a network analysis. Why not just continue with the 18000, provided that there is no computational / infrastructural issue in doing this.

You may also want to try the analysis with the log-transformed normalised counts, by the way.

Kevin

ADD COMMENTlink modified 5 months ago • written 5 months ago by Kevin Blighe43k

Thanks a lot Kevin. Sure I will also try with log transformed normalised counts. I see in Deseq2 tutorial there are two types rlog and vst for log transformed values. Which one should I use? deseq2 tutorial

ADD REPLYlink written 5 months ago by Biologist150
1

Hey, there is no real preference. I would start with vst and then also try with rlog after. Hopefully, results will be similar.

ADD REPLYlink written 5 months ago by Kevin Blighe43k
1

Depends also on the number of samples. If you have many (like > 50) rlog might take several hours to complete because it fits a shrinkage term for every sample which vst doesn't.

ADD REPLYlink modified 5 months ago • written 5 months ago by ATpoint17k

I have tried in both ways with normalised counts and also vst log transformed normalized values. In both the ways I had set minimum module size 50.

With normalized counts - I see 56 modules. with log transformed vales - I see 14 modules.

With log transformed values data, I took the soft power = 3, based on following plot where Rsquare is > 0.8

enter image description here

Among all the modules I'm interested in the module where my interested gene is. I see in both the ways the module where my interested gene belongs have similar number of genes.

But which way is better do you think? only normalized or log transformed normalized?

ADD REPLYlink modified 5 months ago • written 5 months ago by Biologist150
1

Well, this is one of the issues with network analysis approaches. Although the developer of WGCNA implies in one moment that your input data is not critical so long as it is normalised and that all samples are simply processed in the same way, in practice, results are highly variable, and I frequently see people banging their head against a brick wall trying to interpret the results from WGCNA. This is why I specifically never use WGCNA anymore (unless instructed).

In all honesty, I cannot answer your question. To make it easier, I would suggest using vst counts and taking the 14 modules. This is the same recommendation in the contradicting FAQ:

As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data. ... We then recommend a variance-stabilizing transformation.

...

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

[source: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html]

ADD REPLYlink written 5 months ago by Kevin Blighe43k

Thank you very much for the link.

ADD REPLYlink written 5 months ago by Biologist150

In your figure, which is first above 0.9, though? It looks like it may be 6 or 7

ADD REPLYlink modified 5 months ago • written 5 months ago by Kevin Blighe43k

@Kevin Blighe In the second figure in my comments, I see 3 is above R square 0.8. So, I took softPower 3 when using log transformed values for WGCNA.

Do you think this is right? Or should I use every time 6 or 7 as softPower?

In most of the tutorials I see they are using 7 or 8.

See in this tutorial I see 5 is above 0.8 and they took 5 as softpower [https://github.com/hms-dbmi/scw/blob/master/scw2016/tutorials/wgcna/WGCNA.md]

In this they took 8 as softPower [https://hms-dbmi.github.io/scw/WGCNA.html]

In this 7 as softPower [http://pklab.med.harvard.edu/scw2014/WGCNA.html]

ADD REPLYlink modified 4 months ago • written 4 months ago by Biologist150
1

You should generally choose the first soft power that passes 0.9 (not 0.8). This is usually 6 or 7 in most datasets.

ADD REPLYlink written 4 months ago by Kevin Blighe43k
1

Regarding 0.9, in the tutorial at least, it is drawn at 0.9: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-02-networkConstr-man.pdf

ADD REPLYlink written 4 months ago by Kevin Blighe43k
1

thanks a lot for the link Kevin

ADD REPLYlink written 4 months ago by Biologist150

sure. thank you for the quick reply.

ADD REPLYlink written 4 months ago by Biologist150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 998 users visited in the last hour