converting probeID to entrezID
1
0
Entering edit mode
5.9 years ago
jasl • 0

I am trying to convert the probeID of a gene to its entrezID by directly using the annotation package but realized that of the ~55,000 probeIDs that I have for a given dataset, there do not exist ~10,000 mappings between probeID to entrezID. I was wondering what I should do with the unmapped probeIDs since I am not able to convert them to entrezID without the given mapping. I have tried other gene conversion tools such as DAVID and Biomart but both are unable to produce a mapping as well.

What do people generally do for these unmapped geneIDs?

For example, this is one of the datasets that I am looking at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570

I also looked at the GPL file and see that many of the probeIDs are missing the corresponding entrezID/gene symbol. Any ideas what these are?

Also of note are the control probeIDs which are listed towards the end of the file, but there are still these probeIDs without entrezIDs which I assume are not controls. Again, any idea on how to manage these unmapped IDs? Should I just delete them from the file?

R gene • 2.2k views
ADD COMMENT
0
Entering edit mode

it would help if some example problematic probe IDs are posted here.

ADD REPLY
0
Entering edit mode

Sure the annotation is hgu133plus2.db; some example problematic probeIDs are:

244791_at
244480_at
244481_at
244482_at
244483_at
244484_at

ADD REPLY
0
Entering edit mode

look at this CDF: https://www.ebi.ac.uk/arrayexpress/files/A-GEOD-18121/A-GEOD-18121.adf.txt. I could not get a match for the first probe it self using biomart (in R), hgu133plus2.db (in R) and biodbnet.

ADD REPLY
0
Entering edit mode

How does this help convert to entrezID though? The GPL file that I linked earlier already had a full table of the associated GB_ACC for all probeIDs, but my main question is how/if I am able to convert to entrezID and what to do with the unmapped IDs.

ADD REPLY
0
Entering edit mode

it is alternate CDF for that chip.

ADD REPLY
0
Entering edit mode

Sorry, can you elaborate to what this means? I'm a bit new to this sort of thing.

ADD REPLY
0
Entering edit mode

Hi,

I faced a similar situation a few months back, and I got the following link. I must say that I haven't verified the authenticity, and the number of genes covered, but you could still check it out. It did work fine for the genes of my interest.

https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLY
0
Entering edit mode

Hi, this website does not seem to support conversion to entrezID but I tested it anyway to see if it could convert some of my unmapped geneIDs to other types of IDs and it was not able to do so.

ADD REPLY
2
Entering edit mode
5.9 years ago

Do not be so shocked if ~10,000 do not have EntrezIDs. I have been working with Affymetrix microarrays for almost 10 years and their expression array probes target more mRNAs than there are annotated in the databases. In fact, I should put 'mRNA' in apostrophes because a large proportion of what they target consists of hypothetical transcripts.

I was easily able to map all of your listed probes by simply downloading the annotation table provided at the GEO record home-page: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570

Just click on 'Download full table...' g

None of them had known EntrezIDs, and it is quite common that this happens. Use that downloaded table for the purposes of manually annotating your probes. The information of what is contained in each column is listed on the GEO page.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6