Question

Module trait association in wgcna

1

Entering edit mode

3.2 years ago

siu ▴ 160

Hi all, I am using wgcna for coexpression analysis of time series data. I have 48 samples in total (16 time points with 3 replicates). I have identified different modules based on gene correlation across all samples. After this I want to associate these modules to trait of interest. I have categorized these 16 time points into 5 time zones so I want to see which are the different modules which are associated with these 5 time zones. My trait data is:

samples  timezone1   timezone2    timezone3    timezone4    timezone5
dark11      1             0         0                 0        0
dark11_1    1             0         0                 0        0
dark9       1             0         0                 0        0
dark7       0             1         0                 0        0
dark7_1     0             1         0                 0        0
dark5       0             0         1                 0        0
dark5_1     0             0         1                 0        0

and so on

I want to ask if I am doing it right? or is there any other way to associate the traits to modules in categorical data? After identifying the modules related to every time zone, I want to identify intramodular connectivity and hub genes.

RNA-Seq rna-seq assembly alignment R • 2.0k views

ADD COMMENT • link updated 2.4 years ago by synat.keam ▴ 100 • written 3.2 years ago by siu ▴ 160

score 2 · Answer 1 · 2021-01-21

2

Entering edit mode

3.2 years ago

Kevin Blighe 87k

I think that you are doing it correctly. I would use a binary logistic regression model here, something like:

summary(glm(timezone1 ~ pink_module, data = mydata, family = binomial(link = 'logit')))

You can also just perform a numerical correlation:

cor(as.numeric(mydata$pink_module), mydata$timezone1)
cor.test(as.numeric(mydata$pink_module), mydata$timezone1)

Regarding hub genes, I believe the WGCNA package already includes a function (or functions) for this (these).

ADD COMMENT • link 3.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for the reply Kevin! Is the pink_module in your command contain only the genes names or expression values of the genes? and does mydata corresponds to traitdata that I have shown the the table?

ADD REPLY • link 3.2 years ago by siu ▴ 160

0

Entering edit mode

In my code, pink_module would contain the module values returned by WGCNA, with there being 1 value per sample.

Yes, mydata would be of the form that you have shown, but also including extra columns for the modules (pink, blue, green, etc)

ADD REPLY • link 3.2 years ago by Kevin Blighe 87k

0

Entering edit mode

Lot of thanks Kevin, it worked perfectly fine. Sorry to bother you again, but I am getting negative scale independence (y-axis). I have also removed genes which were not suitable for the anlaysis. Does the negative scale dependence showing that my data don't follow the scale free topology or does it shows anything else? Or anything wrong with my data? I have considered softpower threshold = 12 according to WGCNA FAQ.

ADD REPLY • link 3.2 years ago by siu ▴ 160

0

Entering edit mode

Dear Kevin,

I'm searching for thread for association between modules and categorical traits and found your answers interesting. Sorry to jump in as this is related to what I am doing. with your following code you suggested,

summary(glm(timezone1 ~ pink_module, data = mydata, family = binomial(link = 'logit')))

is it possible to fit multiple logistic regression adding all the modules as covariates? or we must fit each module at a time because they are independent from each other? my multiple regression output was really strange as z value was 0 for all modules and p-value was 1 for all modules.

Also with the following correlation test

cor(as.numeric(mydata$pink_module), mydata$timezone1) cor.test(as.numeric(mydata$pink_module), mydata$timezone1) ---- How robust / accurate numeric correlation since we had response is binary coded as 1 and 0.

I fitted logistic regression with one module at a time and did the numeric correlation as you suggested, but p-value was different. none of the module was significant from binary logistic regression. However, with numeric correlation, one of my module had p-value less than 0.05.

I am stuck with this. Hope you could provide a bit of insight into my problem.

Kind Regards,

Synat

ADD REPLY • link 2.4 years ago by synat.keam ▴ 100