WGCNA - Module-trait association
1
0
Entering edit mode
4.0 years ago

Hi,

I am doing the WGCNA with my data (it’s an experiment of GSE, GSE75173) and now I want to perform the module-trait association but I need the datTraits variable that must contain the fenotipic information about all the samples. Someone know how can I extract this fenotipic information from the GSE75173 and put it in the datTrait?

If someone can help me, I would appreciate!

Thanks,

Silvia

wgcna datTrait module-trait association • 1.8k views
ADD COMMENT
1
Entering edit mode
4.0 years ago

Hello Silvia,

You can automatically obtain the phenotype data with the following code:

library(Biobase)
library(GEOquery)
gset <- getGEO("GSE75173", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL15034", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
pData(gset)

You are probably interested in 1 or more of the following columns:

cols <- c('title', 'geo_accession', 'source_name_ch1',
  'characteristics_ch1.1', 'characteristics_ch1.2',
  'characteristics_ch1.3', 'timepoint:ch1')

pData(gset)[,cols]
                                     title geo_accession
GSM1944594           Patient 001, baseline    GSM1944594
GSM1944595 Patient 001, 16 weeks tadalafil    GSM1944595
GSM1944596           Patient 002, baseline    GSM1944596
GSM1944597 Patient 002, 16 weeks tadalafil    GSM1944597
GSM1944598           Patient 003, baseline    GSM1944598
                              source_name_ch1
GSM1944594           PAH-SSc patient_baseline
GSM1944595 PAH-SSc patient_16 weeks tadalafil
GSM1944596           PAH-SSc patient_baseline
GSM1944597 PAH-SSc patient_16 weeks tadalafil
GSM1944598           PAH-SSc patient_baseline
                                                                  characteristics_ch1.1
GSM1944594 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944595 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944596 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944597 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944598 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
                                      characteristics_ch1.2
GSM1944594                     timepoint: 0 weeks; baseline
GSM1944595 timepoint: after 16 weeks of tadalafil treatment
GSM1944596                     timepoint: 0 weeks; baseline
GSM1944597 timepoint: after 16 weeks of tadalafil treatment
GSM1944598                     timepoint: 0 weeks; baseline
                 characteristics_ch1.3                         timepoint:ch1
GSM1944594 clinical outcome: unchanged                     0 weeks; baseline
GSM1944595 clinical outcome: unchanged after 16 weeks of tadalafil treatment
GSM1944596  clinical outcome: negative                     0 weeks; baseline
GSM1944597  clinical outcome: negative after 16 weeks of tadalafil treatment
GSM1944598  clinical outcome: positive                     0 weeks; baseline

I trust that you can do the remaining part of connecting this to your WGCNA data.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin,

Thank you for show me how see my fenotipic information! I have written it down in an csv file with the 6 columns (title, geo_accession, source_name_ch1, characteristics_ch1.1, characteristics_ch1.2, characteristics_ch1.3) like in the Peter Langfelder tutorial. So then I have continued following Peter's tutorial but when I have ran the code, it return me back an error:

“Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed. In addition: Warning message: non-unique values when setting 'row.names'”

The code I have put it’s the following:

traitData = read.csv("clinical.csv", row.names = NULL)
dim(traitData)
names(traitData)
Samples = rownames(DatExprs)
traitRows = match(Samples, traitData$Title)
rownames(traitData) = traitData[traitRows, 1]

I have added row.names = NULL trying to avoid the error but it’s still appearing.

Perhaps it isn’t correct the information that I have written in the csv file, I don’t know

If you can tell me what’s wrong it would be a great help.

Once again, thanks for your attention,

Silvia

ADD REPLY
0
Entering edit mode

Hello again. What are the values in Samples? Why did you choose traitData$Title for the purpose of matching?

Thanks!

ADD REPLY
0
Entering edit mode

Hi,

Samples variable contain the name of the 20 samples on the experiment, it look like that:

[1] "GSM1944594_HF-A1-1-BASE.CEL.gz"    "GSM1944595_HF-A2-1-W16.CEL.gz"    
 [3] "GSM1944596_HF-A3-2-BASE.CEL.gz"    "GSM1944597_HF-A4-2-W16.CEL.gz"    
 [5] "GSM1944598_HF-A5-3-BASE.CEL.gz"    "GSM1944599_HF-A6-3-W16.CEL.gz"    
 [7] "GSM1944600_HF-A7-4-BASE.CEL.gz"    "GSM1944601_HF-A8-4-W16.CEL.gz"    
 [9] "GSM1944602_HF-B1-5-BASE.CEL.gz"    "GSM1944603_HF-B2-5-W16.CEL.gz"    
[11] "GSM1944604_HF-B3-6-BASE.CEL.gz"    "GSM1944605_HF-B4-6-W16.CEL.gz"    
[13] "GSM1944606_HF-B5-7-BASE.CEL.gz"    "GSM1944607_HF-B6-7-W16.CEL.gz"    
[15] "GSM1944608_HF-B7-8-BASE.CEL.gz"    "GSM1944609_HF-B8-8-W16.CEL.gz"    
[17] "GSM1944610_HF-C1-9-BASE.CEL.gz"    "GSM1944611_HF-C2-9-PIONEER.CEL.gz"
[19] "GSM1944612_HF-C3-10-BASE.CEL.gz"   "GSM1944613_HF-C4-10-W16.CEL.gz"

Well, I have matched with “Title” because I have seen other examples and they always match with the column that contains the samples names. However, I have tried to match with the other columns of my traitData but the results are the same that when I match with “Title”.

traitRows = match(Samples, traitData$Title)
traitRows

And it shows me this:

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Thank you

ADD REPLY
0
Entering edit mode

Ah, but, if you output [to screen] the values of both Samples and traitData$Title, you will see that they do not match. That is why NA is returned.

If the following command returns TRUE for you, then it means the the phenotype data is already aligned with your samples variable:

all(pData(gset) == unlist(lapply(strsplit(samples, '_'), function(x) x[[1]])))

This ^^ command is not universal, though, and is just specific to this case.

ADD REPLY
0
Entering edit mode

OK, but the problem is that it doesn't match with any column (title, geo_accession, source_name_ch1, characteristics_ch1.1, characteristics_ch1.2, characteristics_ch1.3) of my traitData and always return me NA. What can I do?

I have put your code and it gives an error:

Error in base::strsplit(x, ...) : non-character argument

Thank you and sorry for my unknowledge

ADD REPLY
0
Entering edit mode

This may work:

all(rownames(pData(gset)) == unlist(lapply(strsplit(as.character(samples), '_'), function(x) x[[1]])))

If not, what is the output of:

typeof(samples)
str(samples)
ADD REPLY
0
Entering edit mode

No, the code that you have given me it doesn’t work. The error is the following:

Error in as.vector(x, "character") : cannot coerce type 'closure' to vector of type 'character'

The outputs of the other codes are the next one, for:

typeof(samples)

It returns me:

[1] "closure"

And for the other one:

str(samples)

It returns me:

Formal class 'standardGeneric' [package "methods"] with 8 slots
  ..@ .Data     :function (object)  
  ..@ generic   : chr "samples"
  .. ..- attr(*, "package")= chr "Biobase"
  ..@ package   : chr "Biobase"
  ..@ group     : list()
  ..@ valueClass: chr(0) 
  ..@ signature : chr "object"
  ..@ default   : NULL
  ..@ skeleton  : language (function (object)  stop("invalid call in method dispatch to 'samples' (no default method)", domain = NA))(object)

Thanks

ADD REPLY
0
Entering edit mode

Hey again, but, earlier, you implied that samples contains:

[1] "GSM1944594_HF-A1-1-BASE.CEL.gz"    "GSM1944595_HF-A2-1-W16.CEL.gz"    
 [3] "GSM1944596_HF-A3-2-BASE.CEL.gz"    "GSM1944597_HF-A4-2-W16.CEL.gz"    
 [5] "GSM1944598_HF-A5-3-BASE.CEL.gz"    "GSM1944599_HF-A6-3-W16.CEL.gz"    
 [7] "GSM1944600_HF-A7-4-BASE.CEL.gz"    "GSM1944601_HF-A8-4-W16.CEL.gz"    
 [9] "GSM1944602_HF-B1-5-BASE.CEL.gz"    "GSM1944603_HF-B2-5-W16.CEL.gz"    
[11] "GSM1944604_HF-B3-6-BASE.CEL.gz"    "GSM1944605_HF-B4-6-W16.CEL.gz"

...or was that 'Samples', with an upper-case 'S'?

Did you not create 'samples' via this sequence of commands?

traitData = read.csv("clinical.csv", row.names = NULL)
dim(traitData)
names(traitData)
Samples = rownames(DatExprs)
traitRows = match(Samples, traitData$Title)
rownames(traitData) = traitData[traitRows, 1]

if 'yes', then it should just be a character vector.

ADD REPLY
0
Entering edit mode

Oh sorry, it’s my fault! Yes, earlier when I have put Samples variable I would mean with upper-case S and is a variable that contain all the samples names. And yes, I have created it with those sequences of commands following Peter Langfelder’s tutorial.

typeof(Samples)

Output:

[1] "character"

So, yes it’s a character vector

str(Samples)

Output:

chr [1:20] "GSM1944594_HF-A1-1-BASE.CEL.gz" "GSM1944595_HF-A2-1-W16.CEL.gz" ...

So, do you know what the problem is?

I’m so sorry for not been more careful with the upper-case S…

Thanks for all

ADD REPLY
0
Entering edit mode

Okay, no problem. This command should now just return TRUE:

all(pData(gset) == unlist(lapply(strsplit(as.character(Samples), '_'), function(x) x[[1]])))

If it returns TRUE, it indicates that the phenotype data is already aligned with your DatExprs data.

ADD REPLY
0
Entering edit mode

Yes, I have tried an returns me FALSE. What does it means? That I can’t match any column of my traitData with the Samples variable?

Thank

ADD REPLY
0
Entering edit mode

Sorry, that should be rownames(pData(gset)) - please try this:

all(rownames(pData(gset)) == unlist(lapply(strsplit(Samples, '_'), function(x) x[[1]])))

If you print, to your screen, the following variables, I am confident that you will begin to understand the next step and why we are doing this:

rownames(pData(gset))
Samples
unlist(lapply(strsplit(Samples, '_'), function(x) x[[1]]))

All that are aiming to ensure is that the phenotype data is aligned to our DatExprs object.

ADD REPLY
0
Entering edit mode

Yes, this new sequence command works and it returns me TRUE. I think that I start understand and I have a problem with my clinical trait file because it doesn’t contain any numeric value. For this reason when I have ran the following code it has given the module-trait relationship’s graphic with NA because there isn’t any numeric value in the clinical trait file to compare rows (modules) with columns (traits), it isn’t?

The code I have ran it's the following:

traitData = read.csv("clinical.csv", row.names = NULL)
dim(traitData)
names(traitData)

Samples = rownames(DatExprs)
traitRows = match(Samples, traitData$Title)
datTraits = traitData[traitRows, -1]
all(rownames(pData(gset)) == unlist(lapply(strsplit(Samples, '_'), function(x) x[[1]])))

nGenes = ncol(DatExprs)
nSamples = nrow(DatExprs)
MEs0 = moduleEigengenes(DatExprs, dynamicColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)

textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
                   signif(moduleTraitPvalue, 1), ")", sep = "")

labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = names(datTraits),
               yLabels = names(MEs),
               ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = greenWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))

The output is this one:

module-trait-relationships

ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6