Question: WGCNA pearson or bicor? Also, should I adjust module correlation Pvalues by Bonferroni?
0
gravatar for jagoor93
13 months ago by
jagoor930
jagoor930 wrote:

Hello!

I am performing a network analysis on WGCNA. I had to separate the samples into 2 networks (depending on the tissue) due to the high variation between tissues. Each network is made by 15 samples (which is supposed to be the minimun acceptable).

In FAQ of WGCNA it is recommended to use bicor correlation rather than pearson correlation. However my data from the ovaries seems to perform better with pearson correlation. (There isnt much difference on the samples from the body).

I send attached the graphs of scale independence of the ovary network when using pearson and bicor.

Pearson

Pearson

Bicor

Bicor

In this case, should I use pearson? or use bicor even if the adjustment to the scale independence seems to be worse?

Also, from the module-trait graph I obtain certain Pvalues associated to each module and some of them are lower than 0.05. However if I adjust the Pvalue by bonferroni (or any other method) and the total number of modules, none of them becomes significant. Does it mean my data is useless? Can I use in further analysis the modules that are significantly associated to any of the traits with uncorrected Pvalues? I want to check for TF associated to the genes in each module to look for regulators of the associated traits (fecundity and lifespan). Should I modify any parameter to make them most significant? By now I have normalized the data (RNAseq data) according to what is said on FAQ on WGCNA page, I choose the soft threshold highlighted by the function picksoftthreshold and I am using signed hybrid networks. All the other parameters are the same than the default ones choosen on the tutorials.

Ovary network modules-trait

Thank you very much!

bicor network wgcna pearson • 1.8k views
ADD COMMENTlink modified 13 months ago by Kevin Blighe48k • written 13 months ago by jagoor930

Hello jagoor93,

The link you’ve added points to the page that contains the image, not the image itself. On ibb.co site, scroll down and look for a tab that says Embed codes. Click on this Embed codes tab. Copy the code in the HTML full image box. Post that line into your post here (instead of the link you've used) to parse the image in automatically.

I've corrected this for you on this occasion.

ADD REPLYlink written 13 months ago by andrew.j.skelton735.8k

Thank you very much!

ADD REPLYlink written 13 months ago by jagoor930
1
gravatar for Kevin Blighe
13 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

I agree that Pearson looks better, particularly on the Scale Independence plot.

Also, from the module-trait graph I obtain certain Pvalues associated to each module and some of them are lower than 0.05. However if I adjust the Pvalue by bonferroni (or any other method) and the total number of modules, none of them becomes significant. Does it mean my data is useless? Can I use in further analysis the modules that are significantly associated to any of the traits with uncorrected Pvalues?

It does not mean that your data is useless. We faced the same issue using WGCNA in the lab in Boston (USA). We were eventually able to publish the data with the nominal (unadjusted) P values. Bonferroni correction is the most stringent P value adjustment, though - why not try with Benjamini-Hochberg?

I want to check for TF associated to the genes in each module to look for regulators of the associated traits (fecundity and lifespan). Should I modify any parameter to make them most significant?

You could modify the tree cut height, which will affect the final number of modules, which, in turn, will affect the P value adjustment. You can also filter out genes before performing WGCNA, like genes of low expression and/or genes of high variance.

Kevin

ADD COMMENTlink modified 13 months ago • written 13 months ago by Kevin Blighe48k

Thanks! That is really helpful. With BH the results are still not significant, but I checked the TF with enriched binding sites and foxo and some GATA genes associated with ageing were at the top when analysing the module associated with lifespan. So I feel confident at least about that module.

I modified tree cut height and there were less modules but less significant and I lost most of the TF associated with ageing, so I dont think I would modify that parameter.

And also, which program can I use to filter by high variance? By now I filtered by low expression and performed log2(fpkm +1) and quantile normalization.

Thank you very much

Javier

ADD REPLYlink written 13 months ago by jagoor930
2

Hey Javier, well, FPKM data is not great - logged FPKM data is neither great. From where did you obtain this data? For differential expression analysis, one should not use FPKM; however, for network analysis, its not the 'end of the World' to use this type of data because network analysis tools like WGCNA are based on correlation.

If you could obtain another form of normalised counts for your data, that may improve the situation. Otherwise, you can remove low variance data as follows:

# generate random data, and log (base 2) this
mat <- log2(matrix(rexp(200, rate=.1), ncol=20))
dim(mat)
[1] 10 20

# obtain the variance of each row (gene)
variances <- apply(mat, 1, function(x) var(x, na.rm=TRUE))

# determine the variance range 
rangeVar <- max(variances) - min(variances)

# keep only those genes whose variance is greater than one-third the variance range
keep <- variances > (rangeVar / 3)
dim(mat[keep,])
[1]  9 20

There are different ways of filtering on variance, though.

ADD REPLYlink modified 13 months ago • written 13 months ago by Kevin Blighe48k

Hi Kevin,

Oh, sorry, I meant I am using log FPKM for the network analysis (It is recommended on WGCNA webpage when working with RNAseq data).

For Diff Expression I am using raw counts as recommended by DESeq2 package

Anyway, I will select the genes with high variance for the network analysis and see if my data improves.

With the parameters I have been using I already obtained really interesting results. The top TF with enriched binding sites on the modules of lifepan are associated with ageing (foxo and some GATA TFs) and the top network bottlenecks and hubs I obtained are inhibitors of mTOR. So I am really exited about it.

Thank you very much!

Javier

ADD REPLYlink written 13 months ago by jagoor930

Okay, seems promising / me parece alentadora.

ADD REPLYlink written 13 months ago by Kevin Blighe48k

the object formed is logical , its a R question plus bit of simple maths why not go for std dev instead of variance ? like can i consider genes that falls within my first two Std deviation ?

i would be glad if you can bit explain more

when i see the object keep its says logical , how do you subset those genes that comes from keep from my main object..like mat?

ADD REPLYlink written 13 months ago by krushnach80580

Yes, keep will be TRUE or FALSE, relating to the original rows (genes) whose variances were greater than the lower third of the overall variance range.

I would encourage you to check the output of each of my commands (above) in order to be sure about what is happening (?)

ADD REPLYlink written 13 months ago by Kevin Blighe48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1342 users visited in the last hour