Question: Extracting information of interest from R
0
gravatar for silsie645
4 weeks ago by
silsie64510
silsie64510 wrote:

So I tried extracting information from a dataset on GEO onto R using

idx <- which(colnames(pData(gset[[1]])) %in% 
c('AgeAtDiagnosis:ch1', 'Death:ch1', 'Gender:ch1', 

  'Grading:ch1','LymphNodesInvaded:ch1','LymphNodesRemoved:ch1', 

  'OverallSurvival_months:ch1', 'pM:ch', 'pN:ch', 'pT:ch', 

  'RectumOrColon:ch1 ', 'Rezidiv:ch1', 

  'TumorFreeSurvival_months:ch1','TumorLocalization:ch1')) 

metadata <- data.frame(pData(gset[[1]])[,idx], 
row.names = rownames(pData(gset[[1]])))

I needed to extract 14 columns from my data set but R only extracted 8. I have re-tried this a number of times but get the same result. How do I obtain the remaining 4 columns? Also, instead of generating the actual values for 'Grading', I obtained the value 'character'. Please how do I solve this?

R • 139 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by silsie64510
colnames(pData(gset[[1]]))
 [1] "title"                       
 [2] "geo_accession"               
 [3] "status"                      
 [4] "submission_date"             
 [5] "last_update_date"            
 [6] "type"                        
 [7] "channel_count"               
 [8] "source_name_ch1"             
 [9] "organism_ch1"                
[10] "characteristics_ch1"         
[11] "characteristics_ch1.1"       
[12] "characteristics_ch1.2"       
[13] "characteristics_ch1.3"       
[14] "characteristics_ch1.4"       
[15] "characteristics_ch1.5"       
[16] "characteristics_ch1.6"       
[17] "characteristics_ch1.7"       
[18] "characteristics_ch1.8"       
[19] "characteristics_ch1.9"       
[20] "characteristics_ch1.10"      
[21] "characteristics_ch1.11"      
[22] "characteristics_ch1.12"      
[23] "characteristics_ch1.13"      
[24] "characteristics_ch1.14"      
[25] "molecule_ch1"                
[26] "extract_protocol_ch1"        
[27] "extract_protocol_ch1.1"      
[28] "label_ch1"                   
[29] "label_protocol_ch1"          
[30] "taxid_ch1"                   
[31] "hyb_protocol"                
[32] "scan_protocol"               
[33] "description"                 
[34] "data_processing"             
[35] "platform_id"                 
[36] "contact_name"                
[37] "contact_email"               
[38] "contact_phone"               
[39] "contact_department"          
[40] "contact_institute"           
[41] "contact_address"             
[42] "contact_city"                
[43] "contact_zip/postal_code"     
[44] "contact_country"             
[45] "supplementary_file"          
[46] "data_row_count"              
[47] "AgeAtDiagnosis:ch1"          
[48] "Death:ch1"                   
[49] "Gender:ch1"                  
[50] "Grading:ch1"                 
[51] "LymphNodesInvaded:ch1"       
[52] "LymphNodesRemoved:ch1"       
[53] "OverallSurvival_months:ch1"  
[54] "pM:ch1"                      
[55] "pN:ch1"                      
[56] "pT:ch1"                      
[57] "RectumOrColon:ch1"           
[58] "Rezidiv:ch1"                 
[59] "TumorFreeSurvival_months:ch1"
[60] "TumorLocalization:ch1"       
[61] "UICC:ch1"                    
> idx
 [1] 47 48 49 50 51 52 53 57
 [9] 58 59 60
For the pM etc, I tried with 'ch1' but I rather got an error message.
ADD REPLYlink modified 4 weeks ago by genomax85k • written 4 weeks ago by silsie64510

Please use ADD COMMENT/ADD REPLY when responding to existing comments to keep threads logically organized. SUBMIT ANSWER is for new answers to the original question.

ADD REPLYlink written 4 weeks ago by genomax85k

As you are not helping me, I can no longer help you further at this moment. Each time that you encounter an error, you need to show the error and the commands that you are running, and also the contents of your data.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe61k

Sorry. Hope to explain myself better in future. So I was wondering why the discard function does not delete the NA values in the dataset:

discard <- apply(metadata, 1, function(x) any( is.na(x) )) 

metadata <- metadata[!discard,]

This is the output I had (I selected the columns of concern):

Rezidiv.ch1

GSM324715           0
GSM324716           0
GSM324717          NA
GSM324718           0
GSM324719           0
GSM324720          NA
GSM324721           0
GSM324722           0
GSM324723           0
GSM324724           0
GSM324725           0
GSM324726           0
GSM324727           0
GSM324728           0
GSM324729           1
GSM324730           0
GSM324731           0
GSM324732           0
GSM324733           0
GSM324734           0
GSM324735           0
GSM324736           0
GSM324737           1
GSM324738           0
GSM324739           1
GSM324740           0
GSM324741           0
GSM324742           1
GSM324743           0
GSM324744           0
GSM324745           0
GSM324746           0
GSM324747           0
GSM324748           0
GSM324749           0
GSM324750          NA
GSM324751          NA
GSM324752          NA
GSM324753          NA
GSM324754           1
GSM324755           0
GSM324756           0
GSM324757           0
GSM324758           1
GSM324759           0
GSM324760           0
GSM324761           0
GSM324762           0
GSM324763           0
GSM324764           0
GSM324765          NA
GSM324766           0
GSM324767           0
GSM324768           1
GSM324769           0
GSM324770           0
GSM324771           0
GSM324772           1
GSM324773           0
GSM324774           0
GSM324775           0
GSM324776           0

 TumorFreeSurvival_months.ch1

GSM324715                           32
GSM324716                           60
GSM324717                           NA
GSM324718                           51
GSM324719                           22
GSM324720                           NA
GSM324721                           47
GSM324722                           35
GSM324723                           64
GSM324724                           49
GSM324725                           41
GSM324726                           50
GSM324727                           33
GSM324728                           37
GSM324729                           14
GSM324730                           38
GSM324731                           43
GSM324732                           39
GSM324733                           43
GSM324734                           43
GSM324735                           60
GSM324736                           59
GSM324737                           10
GSM324738                           59
GSM324739                           36
GSM324740                           NA
GSM324741                           58
GSM324742                           NA
GSM324743                           54
GSM324744                           54
GSM324745                           54
GSM324746                           37
GSM324747                           52
GSM324748                           61
GSM324749                           52
GSM324750                           NA
GSM324751                           NA
GSM324752                           NA
GSM324753                           NA
GSM324754                           30
GSM324755                           55
GSM324756                           55
GSM324757                           13
GSM324758                           NA
GSM324759                           52
GSM324760                           50
GSM324761                           44
GSM324762                           64
GSM324763                           50
GSM324764                           46
GSM324765                           NA
GSM324766                           53
GSM324767                           44
GSM324768                            3
GSM324769                           54
GSM324770                           61
GSM324771                           36
GSM324772                           NA
GSM324773                           47
GSM324774                           47
GSM324775                           54
GSM324776                           59

Thanks

ADD REPLYlink modified 4 weeks ago by Kevin Blighe61k • written 4 weeks ago by silsie64510

I do not know what you are trying to now do. Originally, you were trying to select columns from the pData (metadata)

ADD REPLYlink written 4 weeks ago by Kevin Blighe61k
0
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

Please check the values of

colnames(pData(gset[[1]]))
idx

Then you should be able to solve the problem.

Kevin

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe61k

Ok will try this and see

ADD REPLYlink written 4 weeks ago by silsie64510

I tried this and it seems it was supposed to generate the columns I requested (I am new with R so am not too sure). Wish I could share the image.

ADD REPLYlink written 4 weeks ago by silsie64510
1

why don't you copy & paste the output of the two lines of code that Kevin posted?

ADD REPLYlink written 4 weeks ago by Friederike5.8k

Ok... So idx gave this response:

idx [1] 47 48 49 50 51 52 53 57 [9] 58 59 60 And colnames(pData(gset[[1]])) yielded this: "data_row_count"
[47] "AgeAtDiagnosis:ch1"
[48] "Death:ch1"
[49] "Gender:ch1"
[50] "Grading:ch1"
[51] "LymphNodesInvaded:ch1"
[52] "LymphNodesRemoved:ch1"
[53] "OverallSurvival_months:ch1"
[54] "pM:ch1"
[55] "pN:ch1"
[56] "pT:ch1"
[57] "RectumOrColon:ch1"
[58] "Rezidiv:ch1"
[59] "TumorFreeSurvival_months:ch1" [60] "TumorLocalization:ch1"
[61] "UICC:ch1"

ADD REPLYlink written 4 weeks ago by silsie64510
colnames(pData(gset[[1]])) also yielded this 
"characteristics_ch1"         
[11] "characteristics_ch1.1"       
[12] "characteristics_ch1.2"       
[13] "characteristics_ch1.3"       
[14] "characteristics_ch1.4"       
[15] "characteristics_ch1.5"       
[16] "characteristics_ch1.6"       
[17] "characteristics_ch1.7"       
[18] "characteristics_ch1.8"       
[19] "characteristics_ch1.9"       
[20] "characteristics_ch1.10"      
[21] "characteristics_ch1.11"      
[22] "characteristics_ch1.12"      
[23] "characteristics_ch1.13"      
[24] "characteristics_ch1.14"  
(Sorry the data is long so I just took out the salient ones)

And also with regards with to my second question.. "Grading" column generated 'character' as the values for the metadata character character character character character character character

And thirdly, I noticed that not all the values with NA were discarded using

discard <- apply(metadata, 1, function(x) any( is.na(x) )) 

metadata <- metadata[!discard,]

Thanks.

ADD REPLYlink modified 4 weeks ago by genomax85k • written 4 weeks ago by silsie64510

The output of

colnames(pData(gset[[1]]))

, does not seem correct. Can you show the unfiltered version?

Also, these three will not match:

'pM:ch', 'pN:ch', 'pT:ch'

You need to add a '1'

ADD REPLYlink written 4 weeks ago by Kevin Blighe61k

colnames(pData(gset[[1]])) [1] "title"
[2] "geo_accession"
[3] "status"
[4] "submission_date"
[5] "last_update_date"
[6] "type"
[7] "channel_count"
[8] "source_name_ch1"
[9] "organism_ch1"
[10] "characteristics_ch1"
[11] "characteristics_ch1.1"
[12] "characteristics_ch1.2"
[13] "characteristics_ch1.3"
[14] "characteristics_ch1.4"
[15] "characteristics_ch1.5"
[16] "characteristics_ch1.6"
[17] "characteristics_ch1.7"
[18] "characteristics_ch1.8"
[19] "characteristics_ch1.9"
[20] "characteristics_ch1.10"
[21] "characteristics_ch1.11"
[22] "characteristics_ch1.12"
[23] "characteristics_ch1.13"
[24] "characteristics_ch1.14"
[25] "molecule_ch1"
[26] "extract_protocol_ch1"
[27] "extract_protocol_ch1.1"
[28] "label_ch1"
[29] "label_protocol_ch1"
[30] "taxid_ch1"
[31] "hyb_protocol"
[32] "scan_protocol"
[33] "description"
[34] "data_processing"
[35] "platform_id"
[36] "contact_name"
[37] "contact_email"
[38] "contact_phone"
[39] "contact_department"
[40] "contact_institute"
[41] "contact_address"
[42] "contact_city"
[43] "contact_zip/postal_code"
[44] "contact_country"
[45] "supplementary_file"
[46] "data_row_count"
[47] "AgeAtDiagnosis:ch1"
[48] "Death:ch1"
[49] "Gender:ch1"
[50] "Grading:ch1"
[51] "LymphNodesInvaded:ch1"
[52] "LymphNodesRemoved:ch1"
[53] "OverallSurvival_months:ch1"
[54] "pM:ch1"
[55] "pN:ch1"
[56] "pT:ch1"
[57] "RectumOrColon:ch1"
[58] "Rezidiv:ch1"
[59] "TumorFreeSurvival_months:ch1" [60] "TumorLocalization:ch1"
[61] "UICC:ch1"

idx [1] 47 48 49 50 51 52 53 57 [9] 58 59 60

For PM etc, I tried with ch1 but I got an error response. However, ch gave an accepted response. Thanks.

ADD REPLYlink written 4 weeks ago by silsie64510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1316 users visited in the last hour