Question: How to retrieve the genes associated to an RNA PAXgene gene expression dataset from GEO?
0
gravatar for Davide Chicco
12 months ago by
Davide Chicco110
Canada
Davide Chicco110 wrote:

In the past I have been working with a gene expression dataset generated with Affymetrix and I was able to use the getBM() Bioconductor function to retrieve the genes associated to it.

These are the lines of R code I used to use:

# Gene list
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)

thisAnnotLookup <- getBM(mart=mart, attributes=c("affy_hugene_1_0_st_v1", "ensembl_gene_id", "gene_biotype", "external_gene_name"), filter="affy_hugene_1_0_st_v1", values=rownames(thisGSetExprss), uniqueRows=TRUE)

And everything worked. Now I am working on another microarray dataset, generated with PAXgene, and I am trying to understand how to retrieve the genes associated to it. The platform they used is RNG-MRC_HU25k_STRASBOURG, that I have not found in BioMart.

What can I do?

Thanks!

-- Davide

EDIT: These are the fields present in my GEO variable in R

> str(gset)
Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
  ..@ experimentData   :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. ..@ name             : chr "Yvan,,Devaux"
  .. .. ..@ lab              : chr ""
  .. .. ..@ contact          : chr "yvan.devaux@lih.lu"
  .. .. ..@ title            : chr "Integrated Network and Microarray Analysis to Identify New Biomarkers in Ischemic Heart Disease"
  .. .. ..@ abstract         : chr "A significant proportion of acute myocardial infarction (MI) patients develop heart failure (HF). Early identif"| __truncated__
  .. .. ..@ url              : chr "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11947"
  .. .. ..@ pubMedIds        : chr "20462429\n20414696\n20300185"
  .. .. ..@ samples          : list()
  .. .. ..@ hybridizations   : list()
  .. .. ..@ normControls     : list()
  .. .. ..@ preprocessing    : list()
  .. .. ..@ other            :List of 23
  .. .. .. ..$ contact_address        : chr "120 route d'Arlon"
  .. .. .. ..$ contact_city           : chr "Luxembourg"
  .. .. .. ..$ contact_country        : chr "Luxembourg"
  .. .. .. ..$ contact_email          : chr "yvan.devaux@lih.lu"
  .. .. .. ..$ contact_institute      : chr "LIH"
  .. .. .. ..$ contact_laboratory     : chr "Cardiovascular Research Unit"
  .. .. .. ..$ contact_name           : chr "Yvan,,Devaux"
  .. .. .. ..$ contact_zip/postal_code: chr "1150"
  .. .. .. ..$ geo_accession          : chr "GSE11947"
  .. .. .. ..$ last_update_date       : chr "Mar 19 2012"
  .. .. .. ..$ overall_design         : chr "The 32 patients of this study were divided in 2 groups corresponding to the extreme quartiles of FE values. The"| __truncated__
  .. .. .. ..$ platform_id            : chr "GPL1947"
  .. .. .. ..$ platform_taxid         : chr "9606"
  .. .. .. ..$ pubmed_id              : chr "20462429\n20414696\n20300185"
  .. .. .. ..$ relation               : chr "BioProject: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA105803"
  .. .. .. ..$ sample_id              : chr "GSM302309 GSM302310 GSM302311 GSM302312 GSM302313 GSM302314 GSM302315 GSM302316 GSM302317 GSM302318 GSM302319 G"| __truncated__
  .. .. .. ..$ sample_taxid           : chr "9606"
  .. .. .. ..$ status                 : chr "Public on May 25 2010"
  .. .. .. ..$ submission_date        : chr "Jul 01 2008"
  .. .. .. ..$ summary                : chr "A significant proportion of acute myocardial infarction (MI) patients develop heart failure (HF). Early identif"| __truncated__
  .. .. .. ..$ supplementary_file     : chr "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE11nnn/GSE11947/suppl/GSE11947_RAW.tar"
  .. .. .. ..$ title                  : chr "Integrated Network and Microarray Analysis to Identify New Biomarkers in Ischemic Heart Disease"
  .. .. .. ..$ type                   : chr "Expression profiling by array"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ assayData        :<environment: 0x562675095e10=""> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 69 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr [1:69] NA NA NA NA ...
  .. .. ..@ data             :'data.frame': 32 obs. of  69 variables:
  .. .. .. ..$ title                   : Factor w/ 32 levels "BL 708","DA 706",..: 27 26 18 6 11 28 7 2 3 17 ...
  .. .. .. ..$ geo_accession           : chr [1:32] "GSM302309" "GSM302310" "GSM302311" "GSM302312" ...
  .. .. .. ..$ status                  : Factor w/ 1 level "Public on May 25 2010": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ submission_date         : Factor w/ 1 level "Jul 01 2008": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ last_update_date        : Factor w/ 1 level "May 25 2010": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ type                    : Factor w/ 1 level "RNA": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ channel_count           : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ source_name_ch1         : Factor w/ 12 levels "BL 708","HJ687",..: 11 12 6 12 12 12 12 12 12 5 ...
  .. .. .. ..$ organism_ch1            : Factor w/ 1 level "Homo sapiens": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ characteristics_ch1     : Factor w/ 12 levels "Labeling_reference:BL 708",..: 11 12 6 12 12 12 12 12 12 5 ...
  .. .. .. ..$ characteristics_ch1.1   : Factor w/ 4 levels "Extraction_reference: PAXgene",..: 1 4 1 4 4 4 4 4 4 1 ...
  .. .. .. ..$ characteristics_ch1.2   : Factor w/ 15 levels "Sample_reference: BL 708",..: 11 13 6 14 13 15 14 15 14 5 ...
  .. .. .. ..$ characteristics_ch1.3   : Factor w/ 13 levels "Subject_reference: BL 708",..: 11 12 6 12 12 13 12 13 12 5 ...
  .. .. .. ..$ characteristics_ch1.4   : Factor w/ 4 levels "","Tissue: blood",..: 3 2 3 2 2 2 2 2 2 3 ...
  .. .. .. ..$ characteristics_ch1.5   : Factor w/ 3 levels "","Extraction_amount: 10.0",..: 3 3 3 3 3 2 3 2 3 3 ...
  .. .. .. ..$ characteristics_ch1.6   : Factor w/ 2 levels "","Extraction_amount: 10.0": 2 2 2 2 2 1 2 1 2 2 ...
  .. .. .. ..$ molecule_ch1            : Factor w/ 1 level "total RNA": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ extract_protocol_ch1    : Factor w/ 2 levels "Qiagen","Trizol": 1 2 1 2 2 2 2 2 2 1 ...
  .. .. .. ..$ label_ch1               : Factor w/ 1 level "Cy3, Cy5": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ label_protocol_ch1      : Factor w/ 1 level "Ambion": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ taxid_ch1               : Factor w/ 1 level "9606": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ source_name_ch2         : Factor w/ 22 levels "DA 706","FC 732",..: 19 16 19 4 8 17 5 1 2 19 ...
  .. .. .. ..$ organism_ch2            : Factor w/ 1 level "Homo sapiens": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ characteristics_ch2     : Factor w/ 22 levels "Labeling_reference:DA 706",..: 19 16 19 4 8 17 5 1 2 19 ...
  .. .. .. ..$ characteristics_ch2.1   : Factor w/ 3 levels "Extraction_reference: L62 VN",..: 3 2 3 2 2 2 2 2 2 3 ...
  .. .. .. ..$ characteristics_ch2.2   : Factor w/ 25 levels "Sample_reference: DA 706",..: 21 16 21 4 8 17 5 1 2 20 ...
  .. .. .. ..$ characteristics_ch2.3   : Factor w/ 23 levels "Subject_reference: DA 706",..: 19 16 19 4 8 17 5 1 2 19 ...
  .. .. .. ..$ characteristics_ch2.4   : Factor w/ 2 levels "Tissue: blood",..: 1 1 1 2 1 2 2 2 2 2 ...
  .. .. .. ..$ characteristics_ch2.5   : Factor w/ 2 levels "Extraction_amount: 10.0",..: 2 2 2 2 2 1 2 1 2 2 ...
  .. .. .. ..$ characteristics_ch2.6   : Factor w/ 2 levels "","Extraction_amount: 10.0": 2 2 2 2 2 1 2 1 2 2 ...
  .. .. .. ..$ molecule_ch2            : Factor w/ 1 level "total RNA": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ extract_protocol_ch2    : Factor w/ 2 levels "Qiagen","Trizol": 2 1 2 1 1 1 1 1 1 1 ...
  .. .. .. ..$ label_ch2               : Factor w/ 1 level "Cy3, Cy5": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ label_protocol_ch2      : Factor w/ 1 level "Ambion": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ taxid_ch2               : Factor w/ 1 level "9606": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ hyb_protocol            : Factor w/ 1 level "Agilent : 750.0 ng at 60 degree_C during 17 hours": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ scan_protocol           : Factor w/ 1 level "Scanned on an GenePix 4000B fluorescent scanner.": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ scan_protocol.1         : Factor w/ 1 level "Image intensity data were extracted with GenePix Pro 6.0 analysis software.": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ description             : Factor w/ 18 levels "ejection fraction (EF): 20",..: 18 18 17 16 16 15 15 14 14 13 ...
  .. .. .. ..$ description.1           : Factor w/ 3 levels "group:  B","group: A",..: 3 3 3 3 3 3 3 3 3 3 ...
  .. .. .. ..$ data_processing         : Factor w/ 1 level "Lowess non linear normalization": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ platform_id             : Factor w/ 1 level "GPL1947": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_name            : Factor w/ 1 level "Yvan,,Devaux": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_email           : Factor w/ 1 level "yvan.devaux@lih.lu": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_laboratory      : Factor w/ 1 level "Cardiovascular Research Unit": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_institute       : Factor w/ 1 level "LIH": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_address         : Factor w/ 1 level "120 route d'Arlon": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_city            : Factor w/ 1 level "Luxembourg": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_zip/postal_code : Factor w/ 1 level "1150": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ contact_country         : Factor w/ 1 level "Luxembourg": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ supplementary_file      : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L29921.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ supplementary_file.1    : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L29923.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ supplementary_file.2    : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L30105.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ supplementary_file.3    : Factor w/ 32 levels "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM302nnn/GSM302309/suppl/GSM302309_L30107.gpr.gz",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. ..$ data_row_count          : Factor w/ 1 level "16238": 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. ..$ Extraction_amount:ch1   : chr [1:32] "10.0" "10.0" "10.0" "10.0" ...
  .. .. .. ..$ Extraction_amount:ch2   : chr [1:32] "10.0" "10.0" "10.0" "10.0" ...
  .. .. .. ..$ Extraction_reference:ch1: chr [1:32] "PAXgene" "Trizol" "PAXgene" "Trizol" ...
  .. .. .. ..$ Extraction_reference:ch2: chr [1:32] "Trizol" "PAXgene" "Trizol" "PAXgene" ...
  .. .. .. ..$ Labeling_reference:ch1  : chr [1:32] "L88-TG" "Ref" "L38 DP" "Ref" ...
  .. .. .. ..$ Labeling_reference:ch2  : chr [1:32] "Ref" "L67-SR" "Ref" "KF 692" ...
  .. .. .. ..$ RNA_quality:ch1         : chr [1:32] "null" "null" "null" "null" ...
  .. .. .. ..$ RNA_quality:ch2         : chr [1:32] "null" "null" "null" "null" ...
  .. .. .. ..$ Sample_reference:ch1    : chr [1:32] "L88-TG" "Ref" "L38 DP" "REF" ...
  .. .. .. ..$ Sample_reference:ch2    : chr [1:32] "REF" "L67-SR" "REF" "KF 692" ...
  .. .. .. ..$ Subject_reference:ch1   : chr [1:32] "L88-TG" "Ref" "L38 DP" "Ref" ...
  .. .. .. ..$ Subject_reference:ch2   : chr [1:32] "Ref" "L67-SR" "Ref" "KF 692" ...
  .. .. .. ..$ Tissue:ch1              : chr [1:32] "Blood" "blood" "Blood" "blood" ...
  .. .. .. ..$ Tissue:ch2              : chr [1:32] "blood" "blood" "blood" "Blood" ...
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 16238 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ annotation       : chr "GPL1947"
  ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 32 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. ..@ .Data:List of 4
  .. .. .. ..$ : int [1:3] 3 6 0
  .. .. .. ..$ : int [1:3] 2 44 0
  .. .. .. ..$ : int [1:3] 1 3 0
  .. .. .. ..$ : int [1:3] 1 0 0
rna-seq rna r-language geo • 343 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by Davide Chicco110

Hey Davide, I never heard of that array, but perhaps you can get the annotation that you need from Here? - it's the main page for this array on GEO.

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe65k

Thanks Kevin. I saw that page but I cannot understand how to access those data header fields. Do you know how I can do that?

ADD REPLYlink written 12 months ago by Davide Chicco110
0
gravatar for Davide Chicco
12 months ago by
Davide Chicco110
Canada
Davide Chicco110 wrote:

I was able to solve my own problem by just checking the getGEO() function: I realized that the getGPL must be set to TRUE.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install()

BiocManager::install("GEOquery")

GSE_code <- "GSE11947"
getGEOSuppFiles(GSE_code) 
gset <- getGEO(GSE_code, GSEMatrix =TRUE, getGPL=TRUE)
ADD COMMENTlink written 12 months ago by Davide Chicco110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1328 users visited in the last hour