Question

what kind of data can be directly WGCNA?

0

Entering edit mode

6.4 years ago

a511512345 ▴ 190

hello, I want to use WGCNA for RNA-seq data in TCGA, but I do not know which form of data to download. Is PFKM or FPKM-UQ? Or what kind of data can be directly WGCNA? How do I get them? Is there any website that can be downloaded directly? Looking forward to your answer thank you very much

WGCNA form of data • 4.2k views

ADD COMMENT • link updated 6.4 years ago by Kevin Blighe 87k • written 6.4 years ago by a511512345 ▴ 190

score 1 · Answer 1 · 2017-11-23

Dear friend, I presume that you mean FPKM, not PFKM?

Firstly, it is stated in the FAQ (frequently asked questions) written by the author of WGCNA that any type of nomaised RNA-seq data can be used for WGCNA:

4. Can WGCNA be used to analyze RNA-Seq data?

Yes. As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data.

We suggest removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples) because such low-expressed features tend to reflect noise and correlations based on counts that are mostly zero aren't really meaningful. The actual thresholds should be based on experimental design, sequencing depth and sample counts.

We then recommend a variance-stabilizing transformation. For example, package DESeq2 implements the function varianceStabilizingTransformation which we have found useful, but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1). For highly expressed features, the differences between full variance stabilization and a simple log transformation are small.

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

[source: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html]

If you want to download expression in the form of Z-scores (also suitable for WGCNA) for just a bunch of genes of interest, then you can use cBioPortal by my colleagues at MSKCC. The R implementation of this is CGDSR.

Best of luck, Kevin