How To Convert Geo Dataset With Multiple Platforms Into Expression Set With Annotation To Entrez Id?
2
2
Entering edit mode
13.2 years ago
Zhi ▴ 20

How can I convert the GSE121 that have multiple platform GPL80, GPL98, GPL99, GPL100, GPL101 into expression with the feature names are annotated to ENTREZ id instead of the original probe id? can someone help me with that...and I wanna do it in R/bioconductor. I've tried using GEOquery but I couldnt get it..not sure whether my method is wrong..plz help..

geo gene r bioconductor annotation • 5.8k views
ADD COMMENT
0
Entering edit mode

Without explaining your method, or showing how you were attempting to use GEOquery, there's not much anyone can do to help you. GEOquery has extensive documentation on how to get GSE's into expression sets, so is your question really just about identifier mapping? If so you will want to take a look at the biomaRt package for BioConductor. There are many threads on id conversions on BioStar already

ADD REPLY
0
Entering edit mode

I'm new to R/bioconductor n sorry if my question is not clear enough. What i wanna do is to create expression set with the GSM's as the sampleNames and the ENTREZ id as featureNames. What I did is I obtain the dataset using >gse121 <- getGEO("GSE121",GSEMatrix = TRUE) and I get 5 different expression sets that repression each of GPL. Bcoz the expression sets are using probe id, I tried to annotate to Entrez id using annotate package but it produce NA's for some of the probes which leads to error. My goal is to annotate the expression set and combine them into a single set. Can someone help...

ADD REPLY
0
Entering edit mode

n I've look around the threads but I coulnt find which are similar to my situation since the dataset is across multiple platforms.

ADD REPLY
0
Entering edit mode

Sorry but that is always going to be a problem, the identifier mapping is never going to be perfect or ideal, and will almost certainly lead to NA's. I'd consider how much value there is in maintaining these probesets in the downstream analysis. Annotating to Entrez or Ensembl id's is the way to go. Not all platforms contain the same genes, and the chip design may have used sequences that have been merged or deprecated later.

ADD REPLY
2
Entering edit mode
13.2 years ago

This is really an identifier mapping problem. You could use the BridgeDB webservice which you can call from R. See: [?]http://www.bridgedb.org/wiki/BridgeWebservice[?].

ADD COMMENT
0
Entering edit mode
13.2 years ago
User 59 13k

You might be interested in this recent paper. Not a BioC solution, but seemingly aimed at what you want to do:

"MADGene is a software environment comprising a web-based database and a java application. This platform aims at unifying gene identifiers (ids) and performing gene set analysis. MADGene allows the user to perform inter-conversion of clone and gene ids over a large range of nomenclatures relative to 17 species. We propose a set of 23 functions to facilitate the analysis of gene sets and we give two microarray applications to show how MADGene can be used to conduct meta-analyses."

http://cardioserve.nantes.inserm.fr/madtools/home/

ADD COMMENT

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6