Question: How To Convert Geo Dataset With Multiple Platforms Into Expression Set With Annotation To Entrez Id?
gravatar for Zhi
8.8 years ago by
Zhi20 wrote:

How can I convert the GSE121 that have multiple platform GPL80, GPL98, GPL99, GPL100, GPL101 into expression with the feature names are annotated to ENTREZ id instead of the original probe id? can someone help me with that...and I wanna do it in R/bioconductor. I've tried using GEOquery but I couldnt get it..not sure whether my method is wrong..plz help..

geo gene bioconductor R annotation • 4.6k views
ADD COMMENTlink modified 8.7 years ago by Chris Evelo10.0k • written 8.8 years ago by Zhi20

Without explaining your method, or showing how you were attempting to use GEOquery, there's not much anyone can do to help you. GEOquery has extensive documentation on how to get GSE's into expression sets, so is your question really just about identifier mapping? If so you will want to take a look at the biomaRt package for BioConductor. There are many threads on id conversions on BioStar already

ADD REPLYlink written 8.8 years ago by Daniel Swan13k

I'm new to R/bioconductor n sorry if my question is not clear enough. What i wanna do is to create expression set with the GSM's as the sampleNames and the ENTREZ id as featureNames. What I did is I obtain the dataset using >gse121 <- getGEO("GSE121",GSEMatrix = TRUE) and I get 5 different expression sets that repression each of GPL. Bcoz the expression sets are using probe id, I tried to annotate to Entrez id using annotate package but it produce NA's for some of the probes which leads to error. My goal is to annotate the expression set and combine them into a single set. Can someone help...

ADD REPLYlink written 8.8 years ago by Zhi20

n I've look around the threads but I coulnt find which are similar to my situation since the dataset is across multiple platforms.

ADD REPLYlink written 8.8 years ago by Zhi20

Sorry but that is always going to be a problem, the identifier mapping is never going to be perfect or ideal, and will almost certainly lead to NA's. I'd consider how much value there is in maintaining these probesets in the downstream analysis. Annotating to Entrez or Ensembl id's is the way to go. Not all platforms contain the same genes, and the chip design may have used sequences that have been merged or deprecated later.

ADD REPLYlink written 8.8 years ago by Daniel Swan13k
gravatar for Chris Evelo
8.7 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

This is really an identifier mapping problem. You could use the BridgeDB webservice which you can call from R. See: [?][?].

ADD COMMENTlink written 8.7 years ago by Chris Evelo10.0k
gravatar for Daniel Swan
8.7 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

You might be interested in this recent paper. Not a BioC solution, but seemingly aimed at what you want to do:

"MADGene is a software environment comprising a web-based database and a java application. This platform aims at unifying gene identifiers (ids) and performing gene set analysis. MADGene allows the user to perform inter-conversion of clone and gene ids over a large range of nomenclatures relative to 17 species. We propose a set of 23 functions to facilitate the analysis of gene sets and we give two microarray applications to show how MADGene can be used to conduct meta-analyses."

ADD COMMENTlink written 8.7 years ago by Daniel Swan13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 679 users visited in the last hour