Question

Extracting information of interest from R

0

Entering edit mode

3.9 years ago

silsie645 ▴ 20

So I tried extracting information from a dataset on GEO onto R using

idx <- which(colnames(pData(gset[[1]])) %in% 
c('AgeAtDiagnosis:ch1', 'Death:ch1', 'Gender:ch1', 

  'Grading:ch1','LymphNodesInvaded:ch1','LymphNodesRemoved:ch1', 

  'OverallSurvival_months:ch1', 'pM:ch', 'pN:ch', 'pT:ch', 

  'RectumOrColon:ch1 ', 'Rezidiv:ch1', 

  'TumorFreeSurvival_months:ch1','TumorLocalization:ch1')) 

metadata <- data.frame(pData(gset[[1]])[,idx], 
row.names = rownames(pData(gset[[1]])))

I needed to extract 14 columns from my data set but R only extracted 8. I have re-tried this a number of times but get the same result. How do I obtain the remaining 4 columns? Also, instead of generating the actual values for 'Grading', I obtained the value 'character'. Please how do I solve this?

R • 918 views

ADD COMMENT • link 3.9 years ago by silsie645 ▴ 20

0

Entering edit mode

colnames(pData(gset[[1]]))
 [1] "title"                       
 [2] "geo_accession"               
 [3] "status"                      
 [4] "submission_date"             
 [5] "last_update_date"            
 [6] "type"                        
 [7] "channel_count"               
 [8] "source_name_ch1"             
 [9] "organism_ch1"                
[10] "characteristics_ch1"         
[11] "characteristics_ch1.1"       
[12] "characteristics_ch1.2"       
[13] "characteristics_ch1.3"       
[14] "characteristics_ch1.4"       
[15] "characteristics_ch1.5"       
[16] "characteristics_ch1.6"       
[17] "characteristics_ch1.7"       
[18] "characteristics_ch1.8"       
[19] "characteristics_ch1.9"       
[20] "characteristics_ch1.10"      
[21] "characteristics_ch1.11"      
[22] "characteristics_ch1.12"      
[23] "characteristics_ch1.13"      
[24] "characteristics_ch1.14"      
[25] "molecule_ch1"                
[26] "extract_protocol_ch1"        
[27] "extract_protocol_ch1.1"      
[28] "label_ch1"                   
[29] "label_protocol_ch1"          
[30] "taxid_ch1"                   
[31] "hyb_protocol"                
[32] "scan_protocol"               
[33] "description"                 
[34] "data_processing"             
[35] "platform_id"                 
[36] "contact_name"                
[37] "contact_email"               
[38] "contact_phone"               
[39] "contact_department"          
[40] "contact_institute"           
[41] "contact_address"             
[42] "contact_city"                
[43] "contact_zip/postal_code"     
[44] "contact_country"             
[45] "supplementary_file"          
[46] "data_row_count"              
[47] "AgeAtDiagnosis:ch1"          
[48] "Death:ch1"                   
[49] "Gender:ch1"                  
[50] "Grading:ch1"                 
[51] "LymphNodesInvaded:ch1"       
[52] "LymphNodesRemoved:ch1"       
[53] "OverallSurvival_months:ch1"  
[54] "pM:ch1"                      
[55] "pN:ch1"                      
[56] "pT:ch1"                      
[57] "RectumOrColon:ch1"           
[58] "Rezidiv:ch1"                 
[59] "TumorFreeSurvival_months:ch1"
[60] "TumorLocalization:ch1"       
[61] "UICC:ch1"                    
> idx
 [1] 47 48 49 50 51 52 53 57
 [9] 58 59 60
For the pM etc, I tried with 'ch1' but I rather got an error message.

ADD REPLY • link updated 3.9 years ago by GenoMax 141k • written 3.9 years ago by silsie645 ▴ 20

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing comments to keep threads logically organized. SUBMIT ANSWER is for new answers to the original question.

ADD REPLY • link 3.9 years ago by GenoMax 141k

0

Entering edit mode

As you are not helping me, I can no longer help you further at this moment. Each time that you encounter an error, you need to show the error and the commands that you are running, and also the contents of your data.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry. Hope to explain myself better in future. So I was wondering why the discard function does not delete the NA values in the dataset:

discard <- apply(metadata, 1, function(x) any( is.na(x) )) 

metadata <- metadata[!discard,]

This is the output I had (I selected the columns of concern):

Rezidiv.ch1

GSM324715           0
GSM324716           0
GSM324717          NA
GSM324718           0
GSM324719           0
GSM324720          NA
GSM324721           0
GSM324722           0
GSM324723           0
GSM324724           0
GSM324725           0
GSM324726           0
GSM324727           0
GSM324728           0
GSM324729           1
GSM324730           0
GSM324731           0
GSM324732           0
GSM324733           0
GSM324734           0
GSM324735           0
GSM324736           0
GSM324737           1
GSM324738           0
GSM324739           1
GSM324740           0
GSM324741           0
GSM324742           1
GSM324743           0
GSM324744           0
GSM324745           0
GSM324746           0
GSM324747           0
GSM324748           0
GSM324749           0
GSM324750          NA
GSM324751          NA
GSM324752          NA
GSM324753          NA
GSM324754           1
GSM324755           0
GSM324756           0
GSM324757           0
GSM324758           1
GSM324759           0
GSM324760           0
GSM324761           0
GSM324762           0
GSM324763           0
GSM324764           0
GSM324765          NA
GSM324766           0
GSM324767           0
GSM324768           1
GSM324769           0
GSM324770           0
GSM324771           0
GSM324772           1
GSM324773           0
GSM324774           0
GSM324775           0
GSM324776           0

 TumorFreeSurvival_months.ch1

GSM324715                           32
GSM324716                           60
GSM324717                           NA
GSM324718                           51
GSM324719                           22
GSM324720                           NA
GSM324721                           47
GSM324722                           35
GSM324723                           64
GSM324724                           49
GSM324725                           41
GSM324726                           50
GSM324727                           33
GSM324728                           37
GSM324729                           14
GSM324730                           38
GSM324731                           43
GSM324732                           39
GSM324733                           43
GSM324734                           43
GSM324735                           60
GSM324736                           59
GSM324737                           10
GSM324738                           59
GSM324739                           36
GSM324740                           NA
GSM324741                           58
GSM324742                           NA
GSM324743                           54
GSM324744                           54
GSM324745                           54
GSM324746                           37
GSM324747                           52
GSM324748                           61
GSM324749                           52
GSM324750                           NA
GSM324751                           NA
GSM324752                           NA
GSM324753                           NA
GSM324754                           30
GSM324755                           55
GSM324756                           55
GSM324757                           13
GSM324758                           NA
GSM324759                           52
GSM324760                           50
GSM324761                           44
GSM324762                           64
GSM324763                           50
GSM324764                           46
GSM324765                           NA
GSM324766                           53
GSM324767                           44
GSM324768                            3
GSM324769                           54
GSM324770                           61
GSM324771                           36
GSM324772                           NA
GSM324773                           47
GSM324774                           47
GSM324775                           54
GSM324776                           59

Thanks

ADD REPLY • link updated 3.9 years ago by Kevin Blighe 87k • written 3.9 years ago by silsie645 ▴ 20

0

Entering edit mode

I do not know what you are trying to now do. Originally, you were trying to select columns from the pData (metadata)

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

GenoMax · Answer 1 · 2020-06-05

0

Entering edit mode

3.9 years ago

Kevin Blighe 87k

Please check the values of

colnames(pData(gset[[1]]))
idx

Then you should be able to solve the problem.

Kevin

ADD COMMENT • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Ok will try this and see

ADD REPLY • link 3.9 years ago by silsie645 ▴ 20

0

Entering edit mode

I tried this and it seems it was supposed to generate the columns I requested (I am new with R so am not too sure). Wish I could share the image.

ADD REPLY • link 3.9 years ago by silsie645 ▴ 20

1

Entering edit mode

why don't you copy & paste the output of the two lines of code that Kevin posted?

ADD REPLY • link 3.9 years ago by Friederike 8.9k

0

Entering edit mode

Ok... So idx gave this response:

idx [1] 47 48 49 50 51 52 53 57 [9] 58 59 60 And colnames(pData(gset[[1]])) yielded this: "data_row_count"
[47] "AgeAtDiagnosis:ch1"
[48] "Death:ch1"
[49] "Gender:ch1"
[50] "Grading:ch1"
[51] "LymphNodesInvaded:ch1"
[52] "LymphNodesRemoved:ch1"
[53] "OverallSurvival_months:ch1"
[54] "pM:ch1"
[55] "pN:ch1"
[56] "pT:ch1"
[57] "RectumOrColon:ch1"
[58] "Rezidiv:ch1"
[59] "TumorFreeSurvival_months:ch1" [60] "TumorLocalization:ch1"
[61] "UICC:ch1"

ADD REPLY • link 3.9 years ago by silsie645 ▴ 20

0

Entering edit mode

colnames(pData(gset[[1]])) also yielded this 
"characteristics_ch1"         
[11] "characteristics_ch1.1"       
[12] "characteristics_ch1.2"       
[13] "characteristics_ch1.3"       
[14] "characteristics_ch1.4"       
[15] "characteristics_ch1.5"       
[16] "characteristics_ch1.6"       
[17] "characteristics_ch1.7"       
[18] "characteristics_ch1.8"       
[19] "characteristics_ch1.9"       
[20] "characteristics_ch1.10"      
[21] "characteristics_ch1.11"      
[22] "characteristics_ch1.12"      
[23] "characteristics_ch1.13"      
[24] "characteristics_ch1.14"  
(Sorry the data is long so I just took out the salient ones)

And also with regards with to my second question.. "Grading" column generated 'character' as the values for the metadata character character character character character character character

And thirdly, I noticed that not all the values with NA were discarded using

discard <- apply(metadata, 1, function(x) any( is.na(x) )) 

metadata <- metadata[!discard,]

Thanks.

ADD REPLY • link updated 3.9 years ago by GenoMax 141k • written 3.9 years ago by silsie645 ▴ 20

0

Entering edit mode

The output of

colnames(pData(gset[[1]]))

, does not seem correct. Can you show the unfiltered version?

Also, these three will not match:

'pM:ch', 'pN:ch', 'pT:ch'

You need to add a '1'

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

colnames(pData(gset[[1]])) [1] "title"
[2] "geo_accession"
[3] "status"
[4] "submission_date"
[5] "last_update_date"
[6] "type"
[7] "channel_count"
[8] "source_name_ch1"
[9] "organism_ch1"
[10] "characteristics_ch1"
[11] "characteristics_ch1.1"
[12] "characteristics_ch1.2"
[13] "characteristics_ch1.3"
[14] "characteristics_ch1.4"
[15] "characteristics_ch1.5"
[16] "characteristics_ch1.6"
[17] "characteristics_ch1.7"
[18] "characteristics_ch1.8"
[19] "characteristics_ch1.9"
[20] "characteristics_ch1.10"
[21] "characteristics_ch1.11"
[22] "characteristics_ch1.12"
[23] "characteristics_ch1.13"
[24] "characteristics_ch1.14"
[25] "molecule_ch1"
[26] "extract_protocol_ch1"
[27] "extract_protocol_ch1.1"
[28] "label_ch1"
[29] "label_protocol_ch1"
[30] "taxid_ch1"
[31] "hyb_protocol"
[32] "scan_protocol"
[33] "description"
[34] "data_processing"
[35] "platform_id"
[36] "contact_name"
[37] "contact_email"
[38] "contact_phone"
[39] "contact_department"
[40] "contact_institute"
[41] "contact_address"
[42] "contact_city"
[43] "contact_zip/postal_code"
[44] "contact_country"
[45] "supplementary_file"
[46] "data_row_count"
[47] "AgeAtDiagnosis:ch1"
[48] "Death:ch1"
[49] "Gender:ch1"
[50] "Grading:ch1"
[51] "LymphNodesInvaded:ch1"
[52] "LymphNodesRemoved:ch1"
[53] "OverallSurvival_months:ch1"
[54] "pM:ch1"
[55] "pN:ch1"
[56] "pT:ch1"
[57] "RectumOrColon:ch1"
[58] "Rezidiv:ch1"
[59] "TumorFreeSurvival_months:ch1" [60] "TumorLocalization:ch1"
[61] "UICC:ch1"

idx [1] 47 48 49 50 51 52 53 57 [9] 58 59 60

For PM etc, I tried with ch1 but I got an error response. However, ch gave an accepted response. Thanks.

ADD REPLY • link 3.9 years ago by silsie645 ▴ 20