Question: annotation issue from Ensemble ID to gene name
gravatar for Learner
7 months ago by
Learner 100
Learner 100 wrote:


I asked a question and someone gave an answer which I liked (this is the question . The problem I have been facing is that there are some genes (about 3000 that I cannot annotate) I am using the same method as described here or I tried to convert them based on Uniprot. I have been trying to find a solution which I could not. Is there anybody who knows how to convert them to gene names? I posted few of the ones that I cannot convert.

If there is no solution, then can you please explain why?

rna-seq genome • 307 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Learner 100

Problem is these are retired gene identifiers. If you were to look these up HERE you can map them. see examples below.

ENSG00000166748 = AGBL1
ENSG00000170803 = OR2AG1

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax54k

@genomax are you aware of any way to annotate them with programing? It is very hard to annotate 3000 genes one by one

ADD REPLYlink modified 7 months ago • written 7 months ago by Learner 100

Why are you using old annotations? Did you align your data against hg19/GRCh37?

ADD REPLYlink written 7 months ago by genomax54k

@genomax they are data downloaded from TCGA , I did not align them , I just download the htseq-count

ADD REPLYlink modified 7 months ago • written 7 months ago by Learner 100

I am not sure what you ultimate aim is but you are going to be taking a leap of faith by assuming that results from data aligned to an old genome build are going to translate to current genome build. Any new work you may end up doing, you will likely need to use GRCh38 to be able to publish.

There are rest API end-points for Ensembl archives. You may want to create a help ticket with Ensembl support if you want to get help in using that API. There may also be past threads on Biostars related to this topic.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax54k

I had a similar issue last year. I spoke with Tomas at EBI and he directed me to the REST API also. Basically what happens is it gets the coords of the retired ENSG and then, using those coords, it grabs the new ENSG from the latest reference genome.

He highlighted one likely problem... some old IDs may over lap 2 new Ids - so which one to choose may be an issue.

ADD REPLYlink written 7 months ago by YaGalbi1.3k

@kennethcondon2007 can you please share with me the way you did it? I am really confused and i dont know what to do to get their gene name :-(

ADD REPLYlink written 6 months ago by Learner 100

Unfortunately I never had a chance to implement his advice, but here are the steps I wrote down so I knew where to start when I got back to it:


REST API: MAPPING --> convert coords of one assembly to another

REST API: OVERLAP --> Retrieves features (e.g. geneIDs) that overlap a given region (warning: u may get more than one object for a region but it should be rare)

Sorry I can't be more help.

ADD REPLYlink modified 6 months ago • written 6 months ago by YaGalbi1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 476 users visited in the last hour