Question

RNA Seq data for WGCNA

0

Entering edit mode

6.4 years ago

vrrani ▴ 10

Hi,

I am trying to find correlation between genes and fold change values of RNA seq data, using WGCNA. Module identification is my motivation. In this process I learnt that the minimum sample size for WGCNA is 15 and it is highly expected to get noise if it is less than 15. My final filtered data set is 1000 * 4 (1000 - genes and 4 - logFC value).

Questions:

Can I still use WGCNA, as my focus is only modules identification.
If noise is must, what extent it would be biasing the results.
Is it completely wrong to use WGCNA for this small sample size.

Please let me know.

Thanks in advance.

Regards, Rani.

rna-seq • 4.3k views

ADD COMMENT • link updated 6.4 years ago by pixie@bioinfo ★ 1.5k • written 6.4 years ago by vrrani ▴ 10

1

Entering edit mode

If the manual says that you should not use less that 15 samples then obviously you run with some biasing and also you are enforcing the model. However, your data is pretty solid and deep with less noise probably you could still find something. Just to be on the safer side. If you are trying to understand modules based on FC within genes that cluster together and still maintains your phenotype condition separation, you can also use methods like K-means clustering, especially when you have less number of samples. That way you will not risk the usage of a less number of samples as highlighted by WGCNA manual.

ADD REPLY • link 6.4 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

When you say

1000 * 4 (1000 - genes and 4 - logFC value).

Does it mean you have filtered by logFC? You shouldn't do that. See the FAQ of WGCNA. If you have just 4 samples, then the results will be spurious. WGCNA relays on correlations and you can have quite easily a high correlation with just 4 samples, that's why more samples are important.

ADD REPLY • link 6.4 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

Yes, i filtered my data by logFC. I too observed the high correlation tendency with the out put. Is there any other way apart from traditional K-means, so that i can identify the modules based on logFC and also the modules make scene biologically.

ADD REPLY • link 6.4 years ago by vrrani ▴ 10

0

Entering edit mode

First, it doesn't make sense to look for biologically interesting modules and then restrict to those which have a similar change. First work with WGCNA all samples that have the same biological state (don't mix condition A and controls), Then evaluate if those modules are coherent between two conditions. Maybe a module is split in two in condition B, or two modules join to form a new one. And evaluate using gene set enrichment tools if they make sense biologically.

ADD REPLY • link 6.4 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

WGCNA will not work or let's say give non significant results that should not be trusted. They will be totally not robust with the metrics provided for samples and genes. GO might still provide some information if proper clustering is followed based or them or if clustering based on gene modules done followed with module specific GO.

ADD REPLY • link 6.4 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Sorry but I think I did not understand your comment. Would you mind clarifying why do you say WGCNA will not work? Which metrics for samples and genes are you talking about ? By using gene set enrichment tools I am including gene ontologies.

ADD REPLY • link 6.4 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

As far as OP is saying am thinking OP doesn't have in total 15 samples in total to start with. This is the reason am saying. However traditional GO narrowing to modules of GO categories that clusters genes is perfectly fine depending on the processes that they give. But making co expression modules of genes from WGCNA with less that 15 samples might be not very trustworthy. This is what I wanted to say. Now if the OP says that these 4 FC columns belong to more than 15 samples then it should not be a problem.

ADD REPLY • link 6.4 years ago by ivivek_ngs ★ 5.2k

score 0 · Answer 1 · 2017-11-14

You could go for a normal mutual-rank based correlation network or a pearson correlation network using a high threshold cutoff for the correlation values. You can then use any clustering algorithm/plugin in cytoscape to cluster your networks. Biological validations using functional/pathway enrichment tools can be used on the clusters to validate them functionally.