Question

Network cross-validation

0

Entering edit mode

10 months ago

Arindam Ghosh ▴ 510

How to validate that a network constructed from RNA-Seq data is robust when no additional data are available to construct a second network?

One approach might be to split the data into 5-folds and then use four of the folds to create the network. Once we have five networks for the five folds, we can compare them to see their properties. But which properties would be worth comparing?

Given that I use WGCNA for network construction, would it be worth using the average clustering coefficient of the nodes of the network?

wgcna RNASeq Network CV • 509 views

ADD COMMENT • link updated 10 months ago by LChart 3.9k • written 10 months ago by Arindam Ghosh ▴ 510

score 1 · Answer 1 · 2023-06-03

What do you mean by "robust" in this case? Which of the following is closest to what you want to estimate:

(1) Would I get a similar result with new data and the same analysis approach?

(2) Would I get a similar result with the same data and a different analysis approach?

(3) Would someone else get a similar result with the same data and the same analysis approach?

Under the assumption that you mean (1), the core methodology is basically Langfelder & Horvath (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1001057). If you have a very large dataset, you could compute the full set of network preservation metrics between subsets of your data. This typically is not the case, so an 80/20 holdout that you propose would only be suited for those metrics in Table 1 where "Test netw. input" has "datX = yes". If your holdout has >= 10 samples you could also include metrics for which "Test netw. input" has "Adj = yes"

A general approach that is assumed to help improve these metrics (though I have yet to see an explicit test of this) is "robust WGCNA" (see supplemental materials here: https://www.science.org/doi/10.1126/science.aad6469) -- the idea being basically to build a bootstrapped estimate of the TOM prior to clustering. This has tended to work very well for me.