Question

How To Convert Geo Dataset With Multiple Platforms Into Expression Set With Annotation To Entrez Id?

2

Entering edit mode

13.2 years ago

Zhi ▴ 20

How can I convert the GSE121 that have multiple platform GPL80, GPL98, GPL99, GPL100, GPL101 into expression with the feature names are annotated to ENTREZ id instead of the original probe id? can someone help me with that...and I wanna do it in R/bioconductor. I've tried using GEOquery but I couldnt get it..not sure whether my method is wrong..plz help..

geo gene r bioconductor annotation • 5.8k views

ADD COMMENT • link updated 13.1 years ago by Chris Evelo 10k • written 13.2 years ago by Zhi ▴ 20

0

Entering edit mode

Without explaining your method, or showing how you were attempting to use GEOquery, there's not much anyone can do to help you. GEOquery has extensive documentation on how to get GSE's into expression sets, so is your question really just about identifier mapping? If so you will want to take a look at the biomaRt package for BioConductor. There are many threads on id conversions on BioStar already

ADD REPLY • link 13.2 years ago by User 59 13k

0

Entering edit mode

I'm new to R/bioconductor n sorry if my question is not clear enough. What i wanna do is to create expression set with the GSM's as the sampleNames and the ENTREZ id as featureNames. What I did is I obtain the dataset using >gse121 <- getGEO("GSE121",GSEMatrix = TRUE) and I get 5 different expression sets that repression each of GPL. Bcoz the expression sets are using probe id, I tried to annotate to Entrez id using annotate package but it produce NA's for some of the probes which leads to error. My goal is to annotate the expression set and combine them into a single set. Can someone help...

ADD REPLY • link 13.2 years ago by Zhi ▴ 20

0

Entering edit mode

n I've look around the threads but I coulnt find which are similar to my situation since the dataset is across multiple platforms.

ADD REPLY • link 13.2 years ago by Zhi ▴ 20

0

Entering edit mode

Sorry but that is always going to be a problem, the identifier mapping is never going to be perfect or ideal, and will almost certainly lead to NA's. I'd consider how much value there is in maintaining these probesets in the downstream analysis. Annotating to Entrez or Ensembl id's is the way to go. Not all platforms contain the same genes, and the chip design may have used sequences that have been merged or deprecated later.

ADD REPLY • link 13.2 years ago by User 59 13k

score 2 · Answer 1 · 2011-02-27

2

Entering edit mode

13.2 years ago

Chris Evelo 10k

This is really an identifier mapping problem. You could use the BridgeDB webservice which you can call from R. See: [?]http://www.bridgedb.org/wiki/BridgeWebservice[?].

ADD COMMENT • link 13.2 years ago by Chris Evelo 10k

score 0 · Answer 2 · 2011-02-21

You might be interested in this recent paper. Not a BioC solution, but seemingly aimed at what you want to do:

"MADGene is a software environment comprising a web-based database and a java application. This platform aims at unifying gene identifiers (ids) and performing gene set analysis. MADGene allows the user to perform inter-conversion of clone and gene ids over a large range of nomenclatures relative to 17 species. We propose a set of 23 functions to facilitate the analysis of gene sets and we give two microarray applications to show how MADGene can be used to conduct meta-analyses."

http://cardioserve.nantes.inserm.fr/madtools/home/