Question

WGCNA pearson or bicor? Also, should I adjust module correlation Pvalues by Bonferroni?

3

Entering edit mode

5.7 years ago

jagoor93 ▴ 30

Hello!

I am performing a network analysis on WGCNA. I had to separate the samples into 2 networks (depending on the tissue) due to the high variation between tissues. Each network is made by 15 samples (which is supposed to be the minimun acceptable).

In FAQ of WGCNA it is recommended to use bicor correlation rather than pearson correlation. However my data from the ovaries seems to perform better with pearson correlation. (There isnt much difference on the samples from the body).

I send attached the graphs of scale independence of the ovary network when using pearson and bicor.

Pearson

Bicor

In this case, should I use pearson? or use bicor even if the adjustment to the scale independence seems to be worse?

Also, from the module-trait graph I obtain certain Pvalues associated to each module and some of them are lower than 0.05. However if I adjust the Pvalue by bonferroni (or any other method) and the total number of modules, none of them becomes significant. Does it mean my data is useless? Can I use in further analysis the modules that are significantly associated to any of the traits with uncorrected Pvalues? I want to check for TF associated to the genes in each module to look for regulators of the associated traits (fecundity and lifespan). Should I modify any parameter to make them most significant? By now I have normalized the data (RNAseq data) according to what is said on FAQ on WGCNA page, I choose the soft threshold highlighted by the function picksoftthreshold and I am using signed hybrid networks. All the other parameters are the same than the default ones choosen on the tutorials.

Ovary network modules-trait

Thank you very much!

WGCNA Network pearson bicor • 9.0k views

ADD COMMENT • link updated 15 months ago by rodd ▴ 230 • written 5.7 years ago by jagoor93 ▴ 30

0

Entering edit mode

Hello jagoor93,

The link you’ve added points to the page that contains the image, not the image itself. On ibb.co site, scroll down and look for a tab that says Embed codes. Click on this Embed codes tab. Copy the code in the HTML full image box. Post that line into your post here (instead of the link you've used) to parse the image in automatically.

I've corrected this for you on this occasion.

ADD REPLY • link 5.7 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 5.7 years ago by jagoor93 ▴ 30

score 3 · Answer 1 · 2018-08-15

3

Entering edit mode

5.7 years ago

Kevin Blighe 87k

I agree that Pearson looks better, particularly on the Scale Independence plot.

Also, from the module-trait graph I obtain certain Pvalues associated to each module and some of them are lower than 0.05. However if I adjust the Pvalue by bonferroni (or any other method) and the total number of modules, none of them becomes significant. Does it mean my data is useless? Can I use in further analysis the modules that are significantly associated to any of the traits with uncorrected Pvalues?

It does not mean that your data is useless. We faced the same issue using WGCNA in the lab in Boston (USA). We were eventually able to publish the data with the nominal (unadjusted) P values. Bonferroni correction is the most stringent P value adjustment, though - why not try with Benjamini-Hochberg?

I want to check for TF associated to the genes in each module to look for regulators of the associated traits (fecundity and lifespan). Should I modify any parameter to make them most significant?

You could modify the tree cut height, which will affect the final number of modules, which, in turn, will affect the P value adjustment. You can also filter out genes before performing WGCNA, like genes of low expression and/or genes of high variance.

Kevin

ADD COMMENT • link 5.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks! That is really helpful. With BH the results are still not significant, but I checked the TF with enriched binding sites and foxo and some GATA genes associated with ageing were at the top when analysing the module associated with lifespan. So I feel confident at least about that module.

I modified tree cut height and there were less modules but less significant and I lost most of the TF associated with ageing, so I dont think I would modify that parameter.

And also, which program can I use to filter by high variance? By now I filtered by low expression and performed log2(fpkm +1) and quantile normalization.

Thank you very much

Javier

ADD REPLY • link 5.7 years ago by jagoor93 ▴ 30

2

Entering edit mode

Hey Javier, well, FPKM data is not great - logged FPKM data is neither great. From where did you obtain this data? For differential expression analysis, one should not use FPKM; however, for network analysis, its not the 'end of the World' to use this type of data because network analysis tools like WGCNA are based on correlation.

If you could obtain another form of normalised counts for your data, that may improve the situation. Otherwise, you can remove low variance data as follows:

# generate random data, and log (base 2) this
mat <- log2(matrix(rexp(200, rate=.1), ncol=20))
dim(mat)
[1] 10 20

# obtain the variance of each row (gene)
variances <- apply(mat, 1, function(x) var(x, na.rm=TRUE))

# determine the variance range 
rangeVar <- max(variances) - min(variances)

# keep only those genes whose variance is greater than one-third the variance range
keep <- variances > (rangeVar / 3)
dim(mat[keep,])
[1]  9 20

There are different ways of filtering on variance, though.

ADD REPLY • link 5.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

Oh, sorry, I meant I am using log FPKM for the network analysis (It is recommended on WGCNA webpage when working with RNAseq data).

For Diff Expression I am using raw counts as recommended by DESeq2 package

Anyway, I will select the genes with high variance for the network analysis and see if my data improves.

With the parameters I have been using I already obtained really interesting results. The top TF with enriched binding sites on the modules of lifepan are associated with ageing (foxo and some GATA TFs) and the top network bottlenecks and hubs I obtained are inhibitors of mTOR. So I am really exited about it.

Thank you very much!

Javier

ADD REPLY • link 5.7 years ago by jagoor93 ▴ 30

0

Entering edit mode

Okay, seems promising / me parece alentadora.

ADD REPLY • link 5.7 years ago by Kevin Blighe 87k

0

Entering edit mode

the object formed is logical , its a R question plus bit of simple maths why not go for std dev instead of variance ? like can i consider genes that falls within my first two Std deviation ?

i would be glad if you can bit explain more

when i see the object keep its says logical , how do you subset those genes that comes from keep from my main object..like mat?

ADD REPLY • link 5.7 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

Yes, keep will be TRUE or FALSE, relating to the original rows (genes) whose variances were greater than the lower third of the overall variance range.

I would encourage you to check the output of each of my commands (above) in order to be sure about what is happening (?)

ADD REPLY • link 5.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi folks, thanks for the discussion! Sorry for jumping in, but I wanted to ask (sorry for the super newbie question!).. How do I plot the corrected values? I added the following line to correct the correlation P values (from a multExpr object, from a consensus analysis):

moduleTraitCor = list();
moduleTraitPvalue = list();
moduleTraitPadj = list();

for (set in 1:nSets){
  moduleTraitCor[[set]] = cor(consMEs[[set]]$data, Traits[[set]]$data, use = "p");
  moduleTraitPvalue[[set]] = corPvalueFisher(moduleTraitCor[[set]], exprSize$nSamples[set]);
  moduleTraitPadj[[set]] = p.adjust(moduleTraitPvalue[[set]], method="fdr");
}

But the number of "dimensions" of moduleTraitPvalue[[1]], for example, is 2 lists containing 8x20 values, whereas moduleTraitPadj[[1]] contains 2 lists containing 160 values each. So when I try to plot moduleTraitPadj in the heatmap, it does not come up... Any ideas?

(sorry for the poor language!)

Also:

str(moduleTraitPvalue[[1]])  
num [1:20, 1:8] 0.479 0.713 0.865 0.605 0.153 ...
 - attr(*, "dimnames")=List of 2   ..$ : chr [1:20] "ME12" "ME13" "ME15" "ME19" ...   
    ..$ : chr [1:8] "Var1" "Var2" "Var3" "Var4" ...

str(moduleTraitPadj[[1]])  
num [1:160, 1] 0.543 0.748 0.882 0.663 0.206 ...

typeof(moduleTraitPvalue[[1]])
[1] "double"

typeof(moduleTraitPadj[[1]])
[1] "double"

ADD REPLY • link 4.4 years ago by rodd ▴ 230

0

Entering edit mode

Are you saying that the dimensions of moduleTraitPadj is different from moduleTraitPvalue? Have you actually viewed the moduleTraitPadj list variable on screen? Did you initialise it prior to the loop?

ADD REPLY • link 4.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin! I did initialize them (I updated the code above now to show that). When I View these variables after running the loop, they show a list of 2 (because I have two sets), but the 'dimensions' of each sub-list are different.

The p.adjust function is merging all columns of moduleTraitPvalue (there are 8 columns/variables, and 20 expression modules) into a single column of 160 rows. So I guess my question is: how do I avoid that?

head(moduleTraitPadj[[1]])
               [,1]
  [1,] 5.432805e-01
  [2,] 7.475462e-01
  [3,] 8.816685e-01
  [4,] 6.632555e-01

head(moduleTraitPvalue[[1]])
        Var1          Var2          Var3          Var4          Var5
ME12 0.47876595 2.842498e-01 3.000056e-04 2.869558e-02 4.313020e-04
ME13 0.71254594 4.722252e-08 6.082748e-19 1.985103e-14 2.223507e-20
ME15 0.86452052 2.458712e-06 7.924784e-08 6.892059e-04 1.409194e-66
ME19 0.60522060 6.477622e-11 9.321274e-03 1.535493e-01 4.250861e-03

(Thanks, Kevin!)

ADD REPLY • link 4.4 years ago by rodd ▴ 230

1

Entering edit mode

oooooh, I just figured this out, I had to "force" the dimensions of the old table in the new table..

Sorry for the rookie's mistake, folks.

for (set in 1:nSets){
  moduleTraitCor[[set]] = cor(consMEs[[set]]$data, Traits[[set]]$data, use = "p");
  moduleTraitPvalue[[set]] = corPvalueFisher(moduleTraitCor[[set]], exprSize$nSamples[set]);
  moduleTraitPadj[[set]] = p.adjust(moduleTraitPvalue[[set]], method="fdr");
  dim(moduleTraitPadj[[set]]) <- dim(moduleTraitPvalue[[set]]);
  colnames(moduleTraitPadj[[set]]) <- colnames(moduleTraitPvalue[[set]])
  rownames(moduleTraitPadj[[set]]) <- rownames(moduleTraitPvalue[[set]])
  }

ADD REPLY • link 15 months ago by rodd ▴ 230

0

Entering edit mode

Hi Kevin,

Could I ask how you justified use of the nominal (unadjusted) P values in the end?

best wishes, Roy

ADD REPLY • link 16 months ago by kirby199 • 0

0

Entering edit mode

There is weight in a name - we published from Harvard Medical School.

ADD REPLY • link 16 months ago by Kevin Blighe 87k