Question: WGCNA modules and categorical traits relationship
2
gravatar for Brawni
19 months ago by
Brawni100
France
Brawni100 wrote:

Hi there,

I have looked a bit around for this question but i haven't managed to fully understand the answers. I have done a WGCNA analysis on my data which ended up identifying several modules of co-expressed genes. Now, I would like to calculate significance of the correlation between the eigengenes and the trait data to further narrow down on what is more interesting. My traits data however has two covariates: time points, which is divided in 3, 7 and 35 days and virus strains, which are divided in 5 groups (control included). As far as i understand it shouldn't be allowed to replace integers for each strain group in order to calculate some kind of correlation, or is it? What other statistical analysis could i perform?

wgcna • 4.2k views
ADD COMMENTlink modified 4 weeks ago by ninabhatia30 • written 19 months ago by Brawni100
4
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

If these are the only traits in which you're interested, then you can either correlate the module values to these (with the traits encoded numerically as 1, 2, 3, etc), or, better, build a multinomial logistic regression model with the module values as x and time/strain as y ( glm(time/strain ~ Module1); glm(time/strain ~ Module2); et cetera)

You may also have to consider dividing up the analysis into multiple analyses, and contrast/compare the results manually. For example, running WGCNA separately for the different time-points may be an idea, and then building the regression model predicting for virus strain each time.

ADD COMMENTlink modified 9 months ago • written 19 months ago by Kevin Blighe46k
1

Thanks a million for your answer Kevin, that is what i wanted to know. I'll try both methods you suggested! I already separated my data by another factor (tissue) as this was the main source of variance, interesting point to further divide the data too, ill dig into it!

Cheers!

ADD REPLYlink written 19 months ago by Brawni100

Hi Kevin,

I have TPM RNA-seq file for 53 human stem cell samples control vs lead (pb) treatment in days 1-26 plus to day 0 just in control. For correlating modules to traits in WGCNA, I put 0 in control and 1 in lead treatment samples as screenshot. For example in day1 I put 0 in control and 1 in treatment. Am I correct please? However I don't know what to put for day0

The aim would be time series study

Thank you for any suggestion

https://ibb.co/gPtGVx

ADD REPLYlink modified 17 months ago • written 17 months ago by F3.4k
1

Hello again Superstar, As I understand, you have samples that have been treated with and without lead, the metal (Pb)? Moreover, you have looked at these samples over a time-course of 0-26 days?

It is correct to encode these are 0 (control) and 1 (treatment). For Day0, although the treatment may have no effect, you should still use 0 and 1. Otherwise, you can choose to not include these.

ADD REPLYlink written 17 months ago by Kevin Blighe46k
1

Thanks a lot, you are all right about my experiments; Cells have been harvested prior to treatment (day 0) and daily, from day 1 to 26, after lead exposure and cells without treatment. Thus, I have Control_day1 to Control_day26 and Lead metal_day1 to Lead metal_day26 while I have just Control_day0, totally 53 samples. As I don't have Lead_day0, do you suggest to leave this column all with zero?

Thank you for you pateince

ADD REPLYlink modified 17 months ago • written 17 months ago by F3.4k
1

Well, the 53 day0 samples are just the 'Baseline' samples, in that case.

The way in which you encode these should reflect how you want to use them in your statistical comparisons. I presume that most of your comparisons will be:

  • Lead day1 vs Control day1
  • Lead day2 vs Control day2
  • Lead day3 vs Control day3
  • et cetera

The control day0 samples, therefore, have no immediate use in these types of pairwise comparisons; however, they represent the fundamental baseline state of the cell-type.

Edit: if you encode the day0 samples as all zero, then they will neither have utility in module comparisons because you cannot correlate something to a vector of zeros.

ADD REPLYlink modified 9 months ago • written 17 months ago by Kevin Blighe46k
1

Thanks a lot Kevin, a nice weekend ahead

ADD REPLYlink written 17 months ago by F3.4k

Please excuse me, today I re-read your comment; actually I don't have 53 day0 samples, instead I have totally 53 samples: Control day1 to Control day26 + Lead day1 to Lead day26 + Control day0 = 53 samples

I have put 0 for Controls and 1 for Lead but I don't know either a 0 or 1 should put for Control day0

Thank you

ADD REPLYlink written 17 months ago by F3.4k
1

Perhaps not completely accurate (if Pb has any effect on measurements by its presence) but you could use Control day 0 for both (making an even 54 pairs).

ADD REPLYlink written 17 months ago by genomax70k

Thanks a lot, as always this is not my own data and a pre-existed data set for another application in which PI has asked me to find genes related to each developmental stage in stem cells by WGCNA and time series analysis. As I don't have quantitative trait file for Lead treatment I have to make a binary trait file to relate the modules to each day. However thank you for paying attention.

ADD REPLYlink modified 17 months ago • written 17 months ago by F3.4k

Thanks for helping genomax. I was traveling overnight back to Europe

ADD REPLYlink written 17 months ago by Kevin Blighe46k

Sorry,

For correlating a binary trait file to Module eigengenes, I use Pearson correlation like

 `moduleTraitCor = cor(MsE, datTraits, use= "p")`

as I use for quantitative traits, do you think is correct?

ADD REPLYlink written 17 months ago by F3.4k
1

Yes, that should be fine, if MsE contains your module values and datTraits contains your clinical variables / traits.

Here is an example that I did last year using my own CorLevelPlot code:

h

ADD REPLYlink modified 9 months ago • written 17 months ago by Kevin Blighe46k
1

Thanks a lot Kevin,

ADD REPLYlink written 17 months ago by F3.4k

Excuse me for too much questioning,

I read that EBSeq can manage differential expression without replicates. I need differentially expressed genes to obtain principal component for WGCNA. what made me confused is: I have control cell line day1 to day 26 and Lead(pb) treatment day1 to day26. The experimental design aims to get the impact of Lead(pb) on developmental process . I don't know whether I can consider days as replicates or in fact each day is a distinct condition?

Thank you for your time

ADD REPLYlink modified 17 months ago • written 17 months ago by F3.4k
1

I think that each day is a distinct 'condition'. The interest is in finding out the expression patterns that have changed on each day.

DESeq2 neither requires replicates due to the face that a 'pseudo-reference' is used for the purposes of data normalisation (specifically the size factor calculation).

ADD REPLYlink written 17 months ago by Kevin Blighe46k
1

Thank you very much for help

ADD REPLYlink written 17 months ago by F3.4k

Excuse me Kevin,

I was given a trait file contains both quantitative and categorical traits for WGCNA as this figure

https://ibb.co/mdVudS

Gender column is (1=male, 2=female) and Biopsy_Taken column is post or pre training

I could not figure out how manage these columns for WGCNA, I then changed these columns so as this figure

https://ibb.co/cqCkk7

where WGCNA gave me this heatmap

https://ibb.co/mD1BQ7

As you are considering, female vs male or pre-training vs post training show the same correlation only positively or negatively

If you were me how you relate these traits to your principal components please?

These are two other changes I did and then plotted

https://ibb.co/fXuCF7

https://ibb.co/izWg8S

I think as genes expression has been measured in pre-training vs post-training, might be no need to include pre or post training in trait file

ADD REPLYlink modified 17 months ago • written 17 months ago by F3.4k
1

Hello Fereshteh, you have technically already done this in the best way. For these 'binary' traits, such as male / female, case / control, pass / fail, et cetera, it is best to encode them simply as 0 and 1. If you find a statistically significant positive correlation, it immediately indicates that there is a relationship between the module and the binary trait. You do not have to split the binary traits into 2 further traits.

Looking at your figure, I can say the following:

  • Gender has a statistically significant influence on 4 different modules (at 5% alpha , i.e., p<0.05)
  • Biopsy site has a statistically significant influence on 2 modules

The direction of the correlation is not of immediate interest. For binary traits, we just want to see if there are any statistically significant ones.

This way of working with binary traits follows the recommendation of the chief WGCNA developer.

ADD REPLYlink written 17 months ago by Kevin Blighe46k

Thank you, I will keep on then

ADD REPLYlink written 17 months ago by F3.4k

Hello Kevin,

Could you please explain what is meaning of the negative correlation in the module-trait heatmap? I knew the direction of the correlations, probably, was not important from your above comments. I am just curious about it, and my traits are not binary. I got total 105 samples from 5 different tissues, 7 time points and with 3 replicates. Just a guess, does it mean genes in these modules are underexpressed regarding this particular trait?

And this is my module-trait relationship.

https://ibb.co/5jrNYTZ

Thank you in advance

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by linyao0

Hi Kevin,

How would you reccomend we build the logistic regression model. I am relatively new to R and having a lot of trouble trying to download nnet and follow this tutorial linked below for building a multinomial logistic regression model.

ADD REPLYlink written 4 weeks ago by ninabhatia30

I cannot see what data that you have in front of you; however, generally, it just involves regressing the module eigenvalues to whatever traits that you have, e.g.:

glm(CaseControl ~ module1)
glm(CaseControl ~ module2)
... ...
glm(CaseControl ~ moduleX)
ADD REPLYlink written 4 weeks ago by Kevin Blighe46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1435 users visited in the last hour