Scale-free topology fit index WGCNA
1
0
Entering edit mode
10 months ago
anon • 0

Hi!

I am struggling performing a WGCNA analysis for the first time since it seems that all the links for the "official tutorials" are not working anymore (such as https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/).

I want to perform a WGCNA analysis from a dataset of 3837 genes and 51 samples. To do so, I have pre-filtered lowly expressed genes, got the vsd values from the normalized counts, batch-corrected the vsd values and selected those genes in quartil_50 for the standard desviation.

I am now applying the WGCNA pipeline, but regarding the pickSoftThreshold() step I don't understand if a need to use the networkType = "unsigned" or "signed". Can I calculate the threshold with "unsigned" but then perform blockwiseModules() with a "signed" one (such it is done in https://bioinformaticsworkbook.org/dataAnalysis/RNA-Seq/RNA-SeqIntro/wgcna.html#gsc.tab=0) or it is incorrect and same networkType should be applied in both steps?

# Call the network topology analysis function
powers = c(c(1:10), seq(from = 12, to = 12, by = 2))
sft = pickSoftThreshold(
  input_mat,             # <= Input data
  power = powers,
  networkType = "unsigned",
  verbose = 5
)

With this procedure I get a value of 9

# Call the network topology analysis function
powers = c(c(1:10), seq(from = 12, to = 30, by = 2))
sft = pickSoftThreshold(
  input_mat,             # <= Input data
  power = powers,
  networkType = "signed",
  verbose = 5
)

With this I reach 0.8 at 26 but the R^2 is still increasing with the power 27-30, so I don't know if there is any problem with my dataset and how to check it.

Thank you so much

unsigned signed WGCNA • 1.9k views
ADD COMMENT
0
Entering edit mode
10 months ago

Can I calculate the threshold with "unsigned" but then perform blockwiseModules() with a "signed" one (such it is done in https://bioinformaticsworkbook.org/dataAnalysis/RNA-Seq/RNA-SeqIntro/wgcna.html#gsc.tab=0) or it is incorrect and same networkType should be applied in both steps?

That is incorrect. The netwrokType must be the same in both steps

I don't know if there is any problem with my dataset and how to check it.

Did you run a PCA after normalization to check if you have oulier samples?

ADD COMMENT
0
Entering edit mode

The dataset is composed of 51 samples of different strains (wt and mutants) that have been cultured in 3 different media. I have around 7000 genes after (1) filtering lowly expressed genes, (2) counts normalization, (3) variance stabilization and (4) batch correction with removebatcheffect(). This is the PCA of the vst batch-normalized values (different color depending on the media) enter image description here

If I also perform a (5) step of selecting the most variable genes by filtering those with a variance of the vsd values < quartile 50 I reduce the dataset to around 3800 genes. This is the PCA:

enter image description here

I have been reading similar questions and I suspect that the samples from the glucose cultures are indeed outliers that are hindering the correct performance of the WGCNA analysis (is that probably the reason of the high power values for the model, right?). When generating the modules with power 26 I also get two large modules that are highly correlated (negative and positively) with the glucose vs. all condition (I suspect that it is also caused by those samples being much more different to the rest). Should I perform the co-expression analysis without considering the glucose samples?

My concern is that we re mainly interested in the co-expression network of one specific gene that is significantly repressed in the glucose conditions (hence being much more expressed in the non-glucose conditions). We are interested in detecting which genes are co-expressed with this specific gene, so I initially thought that it was important to keep both glucose - non glucose conditions to being able to detect the genes that have the same repression/induction pattern. Won't I lose information if I just use the cellulose-no carbon conditions in which this gene is generally equally expressed? Will I be still able to generate a co-expression network of a module that includes this gene?

Another idea was just to calculate the pearson correlation matrix of the genes dataset based on the normalized vsd values, select those genes that have >0.8 or <-0.8 correlation to my gene of interest and then perform the enrichment analysis with these lists to get an idea of the pathways/terms that are correlated to the gene. However, I initially thought that the WGCNA was a more informative approach.

Thank you so much.

ADD REPLY
0
Entering edit mode

I have been reading similar questions and I suspect that the samples from the glucose cultures are indeed outliers that are hindering the correct performance of the WGCNA analysis (is that probably the reason of the high power values for the model, right?).

This should not be the case. When you have a strong driver of variation (eg glucose) you tipically reach a scale free topology at very low power with a very high mean connectivity (you tipically want a mean connectivity below 100). With more than 40 samples you should reach a scale-free topology at power of 6 for Unsigned and signed hybrid networks and power of 12 for signed networks (check the FAQ file of WGCNA: https://www.dropbox.com/scl/fo/4vqfiysan6rlurfo2pbnk/h?rlkey=thqg8wlpdn4spu3ihjuc1kmlu&e=1&dl=0).

What happen to the scale free topology without running the step 5 (select the most variable genes)? Can you provide the full output of pickSoftThreshold?

ADD REPLY
0
Entering edit mode

When I run the step 5 I get the following pickSoftThreshold threshold for signed and unsigned network, respectively:

  • Signed

enter image description here

  • Unsigned

enter image description here

When I don't run the step 5 this is the following result:

  • Signed

enter image description here

  • Unsigned

enter image description here

Thank you!

ADD REPLY
0
Entering edit mode

Without the Sep 5, there is an improvement, at least for the unsigned network. The mean connectivity (mean.k.) is a little bit too high but not crazy high. So, I would pick 7 and go with the construction of an unsigned network.

ADD REPLY

Login before adding your answer.

Traffic: 2817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6