Question: Merging expression data from multiple platforms
0
gravatar for mforde84
3.3 years ago by
mforde841.2k
mforde841.2k wrote:

I'm looking to do a meta-analysis of expression arrays in GEO for a particular cell line. I've been able to determine which GSE my samples are associated with, however there are a wide variety of GPL associated with them. At the moment I'm using GEOquery to retrieve the GSE ExpressionSets, and I'm curious if there is a way to match probe IDs across all of the ExpressionSets to a gene identifier like HGCN, Entrez ID, Ensembl etc.

I've tried merging ExpressionSets with inSilicoMerging, however the program is merging by the objects featureName of probe ID. Different platforms ... different probe names. So the only time merging actually works is when merging GSE with the same GPL. I have all of the GPL annotations, and I've gone through each and mapped probe ID to gene name, though I'm not sure how to go about using these to change the featureNames in my ExpressionSet objects to their corresponding gene names.

Any suggestions are appreciated.

Marty

expression array microarray • 1.9k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by mforde841.2k
> library(GEOquery)
> library(inSilicoMerging)
> eset1 <- getGEO("GSE49962") #[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
> eset2 <- getGEO("GSE53494") #[HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version]
> eset1 = eset1[[1]]
> eset2 = eset2[[1]]
> eset1
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 6 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1210881 GSM1210882 ... GSM1210886 (6 total)
  varLabels: title geo_accession ... data_row_count (31 total)
  varMetadata: labelDescription
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL570 
> eset2
ExpressionSet (storageMode: lockedEnvironment)
assayData: 32321 features, 24 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1294905 GSM1294906 ... GSM1294928 (24 total)
  varLabels: title geo_accession ... data_row_count (34 total)
  varMetadata: labelDescription
featureData
  featureNames: 7892501 7892502 ... 8180418 (32321 total)
  fvarLabels: ID GB_LIST ... category (12 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL6244 
> linker <- list(eset1, eset2)
> merged_data <- merge(linker)
  INSILICOMERGING: Run with no additional merging technique...
  INSILICOMERGING:  ! WARNING ! Number of common genes < 1%
> merged_data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 30 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1210881 GSM1210882 ... GSM1294928 (30 total)
  varLabels: channel_count characteristics_ch1 ... type (34 total)
  varMetadata: labelDescription
featureData
  featureNames:
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL570 GPL6244 
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] inSilicoMerging_1.15.0 GEOquery_2.38.4        Biobase_2.32.0        
[4] BiocGenerics_0.18.0   

loaded via a namespace (and not attached):
 [1] lattice_0.20-33      IRanges_2.6.1        XML_3.98-1.4        
 [4] bitops_1.0-6         R6_2.1.3             grid_3.3.1          
 [7] xtable_1.8-2         DBI_0.5              stats4_3.3.1        
[10] DESeq_1.24.0         RSQLite_1.0.0        httr_1.2.1          
[13] genefilter_1.54.2    annotate_1.50.0      S4Vectors_0.10.3    
[16] Matrix_1.2-6         splines_3.3.1        RColorBrewer_1.1-2  
[19] geneplotter_1.50.0   RCurl_1.95-4.8       survival_2.39-5     
[22] AnnotationDbi_1.34.4
ADD REPLYlink written 3.3 years ago by mforde841.2k
1
gravatar for Manvendra Singh
3.3 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

May be assign, mean of all probes to its target gene. then choose only those genes that are detectable in all platforms. once you have equal number of rows in different datasets, you can easily merge it by "merge" function in R

ADD COMMENTlink written 3.3 years ago by Manvendra Singh2.1k
1
gravatar for mforde84
3.3 years ago by
mforde841.2k
mforde841.2k wrote:

Figured it out:

gpl_annotation <- read.delim("~/gpl_annotation.txt")
count=1
for (name in featureNames(eset1)){
    lookup_index <- which(gpl_annotation$V1==name)
    try({featureNames(eset1)[[c]] = as.character(gpl_annotation[lookup_index,2])},TRUE)
    count=count+1
}

Then, I'll mean of probes per target then merge.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1733 users visited in the last hour