Hello Silvia,
You can automatically obtain the phenotype data with the following code:
library(Biobase)
library(GEOquery)
gset <- getGEO("GSE75173", GSEMatrix =TRUE, getGPL=FALSE)
if (length(gset) > 1) idx <- grep("GPL15034", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
pData(gset)
You are probably interested in 1 or more of the following columns:
cols <- c('title', 'geo_accession', 'source_name_ch1',
'characteristics_ch1.1', 'characteristics_ch1.2',
'characteristics_ch1.3', 'timepoint:ch1')
pData(gset)[,cols]
title geo_accession
GSM1944594 Patient 001, baseline GSM1944594
GSM1944595 Patient 001, 16 weeks tadalafil GSM1944595
GSM1944596 Patient 002, baseline GSM1944596
GSM1944597 Patient 002, 16 weeks tadalafil GSM1944597
GSM1944598 Patient 003, baseline GSM1944598
source_name_ch1
GSM1944594 PAH-SSc patient_baseline
GSM1944595 PAH-SSc patient_16 weeks tadalafil
GSM1944596 PAH-SSc patient_baseline
GSM1944597 PAH-SSc patient_16 weeks tadalafil
GSM1944598 PAH-SSc patient_baseline
characteristics_ch1.1
GSM1944594 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944595 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944596 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944597 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
GSM1944598 patient condition: pulmonary arterial hypertension and scleroderma (PAH-SSc)
characteristics_ch1.2
GSM1944594 timepoint: 0 weeks; baseline
GSM1944595 timepoint: after 16 weeks of tadalafil treatment
GSM1944596 timepoint: 0 weeks; baseline
GSM1944597 timepoint: after 16 weeks of tadalafil treatment
GSM1944598 timepoint: 0 weeks; baseline
characteristics_ch1.3 timepoint:ch1
GSM1944594 clinical outcome: unchanged 0 weeks; baseline
GSM1944595 clinical outcome: unchanged after 16 weeks of tadalafil treatment
GSM1944596 clinical outcome: negative 0 weeks; baseline
GSM1944597 clinical outcome: negative after 16 weeks of tadalafil treatment
GSM1944598 clinical outcome: positive 0 weeks; baseline
I trust that you can do the remaining part of connecting this to your WGCNA data.
Kevin
Hi Kevin,
Thank you for show me how see my fenotipic information! I have written it down in an csv file with the 6 columns (title, geo_accession, source_name_ch1, characteristics_ch1.1, characteristics_ch1.2, characteristics_ch1.3) like in the Peter Langfelder tutorial. So then I have continued following Peter's tutorial but when I have ran the code, it return me back an error:
“Error in
.rowNamesDF<-
(x, value = value) : duplicate 'row.names' are not allowed. In addition: Warning message: non-unique values when setting 'row.names'”The code I have put it’s the following:
I have added row.names = NULL trying to avoid the error but it’s still appearing.
Perhaps it isn’t correct the information that I have written in the csv file, I don’t know
If you can tell me what’s wrong it would be a great help.
Once again, thanks for your attention,
Silvia
Hello again. What are the values in
Samples
? Why did you choosetraitData$Title
for the purpose of matching?Thanks!
Hi,
Samples variable contain the name of the 20 samples on the experiment, it look like that:
Well, I have matched with “Title” because I have seen other examples and they always match with the column that contains the samples names. However, I have tried to match with the other columns of my traitData but the results are the same that when I match with “Title”.
And it shows me this:
Thank you
Ah, but, if you output [to screen] the values of both
Samples
andtraitData$Title
, you will see that they do not match. That is why NA is returned.If the following command returns
TRUE
for you, then it means the the phenotype data is already aligned with yoursamples
variable:This ^^ command is not universal, though, and is just specific to this case.
OK, but the problem is that it doesn't match with any column (title, geo_accession, source_name_ch1, characteristics_ch1.1, characteristics_ch1.2, characteristics_ch1.3) of my traitData and always return me NA. What can I do?
I have put your code and it gives an error:
Thank you and sorry for my unknowledge
This may work:
If not, what is the output of:
No, the code that you have given me it doesn’t work. The error is the following:
The outputs of the other codes are the next one, for:
It returns me:
And for the other one:
It returns me:
Thanks
Hey again, but, earlier, you implied that
samples
contains:...or was that '
Samples
', with an upper-case 'S'?Did you not create 'samples' via this sequence of commands?
if 'yes', then it should just be a character vector.
Oh sorry, it’s my fault! Yes, earlier when I have put Samples variable I would mean with upper-case S and is a variable that contain all the samples names. And yes, I have created it with those sequences of commands following Peter Langfelder’s tutorial.
Output:
So, yes it’s a character vector
Output:
So, do you know what the problem is?
I’m so sorry for not been more careful with the upper-case S…
Thanks for all
Okay, no problem. This command should now just return
TRUE
:If it returns
TRUE
, it indicates that the phenotype data is already aligned with yourDatExprs
data.Yes, I have tried an returns me
FALSE
. What does it means? That I can’t match any column of my traitData with the Samples variable?Thank
Sorry, that should be
rownames(pData(gset))
- please try this:If you print, to your screen, the following variables, I am confident that you will begin to understand the next step and why we are doing this:
All that are aiming to ensure is that the phenotype data is aligned to our
DatExprs
object.Yes, this new sequence command works and it returns me TRUE. I think that I start understand and I have a problem with my clinical trait file because it doesn’t contain any numeric value. For this reason when I have ran the following code it has given the module-trait relationship’s graphic with NA because there isn’t any numeric value in the clinical trait file to compare rows (modules) with columns (traits), it isn’t?
The code I have ran it's the following:
The output is this one: