Question: annotation issue from Ensemble ID to gene name
0
gravatar for Learner
10 months ago by
Learner 100
Learner 100 wrote:

Hello,

I asked a question and someone gave an answer which I liked (this is the question https://www.biostars.org/p/293965/#294344) . The problem I have been facing is that there are some genes (about 3000 that I cannot annotate) I am using the same method as described here or I tried to convert them based on Uniprot. I have been trying to find a solution which I could not. Is there anybody who knows how to convert them to gene names? I posted few of the ones that I cannot convert.

If there is no solution, then can you please explain why?

ENSG00000122718
ENSG00000130201
ENSG00000150076
ENSG00000150526
ENSG00000155640
ENSG00000166748
ENSG00000168260
ENSG00000168787
ENSG00000170590
ENSG00000170803
ENSG00000171484
ENSG00000172381
ENSG00000172774
rna-seq genome • 366 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by Learner 100

Problem is these are retired gene identifiers. If you were to look these up HERE you can map them. see examples below.

ENSG00000166748 = AGBL1
ENSG00000170803 = OR2AG1

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax58k

@genomax are you aware of any way to annotate them with programing? It is very hard to annotate 3000 genes one by one

ADD REPLYlink modified 10 months ago • written 10 months ago by Learner 100

Why are you using old annotations? Did you align your data against hg19/GRCh37?

ADD REPLYlink written 10 months ago by genomax58k

@genomax they are data downloaded from TCGA , I did not align them , I just download the htseq-count

ADD REPLYlink modified 10 months ago • written 10 months ago by Learner 100

I am not sure what you ultimate aim is but you are going to be taking a leap of faith by assuming that results from data aligned to an old genome build are going to translate to current genome build. Any new work you may end up doing, you will likely need to use GRCh38 to be able to publish.

There are rest API end-points for Ensembl archives. You may want to create a help ticket with Ensembl support if you want to get help in using that API. There may also be past threads on Biostars related to this topic.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax58k

I had a similar issue last year. I spoke with Tomas at EBI and he directed me to the REST API also. Basically what happens is it gets the coords of the retired ENSG and then, using those coords, it grabs the new ENSG from the latest reference genome.

He highlighted one likely problem... some old IDs may over lap 2 new Ids - so which one to choose may be an issue.

ADD REPLYlink written 10 months ago by YaGalbi1.4k

@kennethcondon2007 can you please share with me the way you did it? I am really confused and i dont know what to do to get their gene name :-(

ADD REPLYlink written 10 months ago by Learner 100

Unfortunately I never had a chance to implement his advice, but here are the steps I wrote down so I knew where to start when I got back to it:

ENSEMBL REST API

REST API: MAPPING --> convert coords of one assembly to another

REST API: OVERLAP --> Retrieves features (e.g. geneIDs) that overlap a given region (warning: u may get more than one object for a region but it should be rare)

Sorry I can't be more help.

ADD REPLYlink modified 10 months ago • written 10 months ago by YaGalbi1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour