Problem When Converting Aliases to Entrez Gene ID
1
1
Entering edit mode
5.6 years ago

UPDATE: I posted another question with more details that is related to this problem. The biostar link can be found here :link

I am using

library(org.Hs.eg.db)

I have a list of human gene symbols like this

mysymbols<-c("AXUD1", "CENTG1", "DGCR14", "ERBB2IP", "KIAA1627", "NARG1L", "NOS2A", "O00366", "Q6ZP10")

I want to get the entrez id for these genes and these are all aliases so I am using

info4gene = select(org.Hs.eg.db, mysymbols, c("ENTREZID"), "ALIAS")

to get the entrez ids. This is able to get the entrez ids for some of the gene symbols. The output I get is like this

    ALIAS ENTREZID
1    AXUD1    64651
2   CENTG1   116986
3   DGCR14     8220
4  ERBB2IP    55914
5 KIAA1627     <NA>
6   NARG1L    79612
7    NOS2A     4843
8   O00366     <NA>
9   Q6ZP10     <NA>

So for example KIAA1627 is an alias for the METTL14 gene and the result should return 57721 instead of NA. I have tried other things like using the mygene python library and bitr function from cluster profiler library. Also I would like to do this process automatically so I need a script or a code rather than the ncbi website or the online david tool.

gene entrez • 2.5k views
ADD COMMENT
0
Entering edit mode

what is select ? what is that library ?

ADD REPLY
1
Entering edit mode

i am using library(org.Hs.eg.db)

ADD REPLY
0
Entering edit mode

Not sure if this is applicable. Worth a mention: mapping between gene symbol and entrez ID

ADD REPLY
0
Entering edit mode

unfortunately not. I am using the same function mentioned in that post. But as my output shows I can't get the gene id for all of the gene symbols with this method. I also tried "SYMBOL" instead of "ALIAS" but no result.

ADD REPLY
2
Entering edit mode
5.6 years ago

Confirmed the MyGene API yields the same result. Where did these symbols originally come from? It might be easier to go back to your original mapping method and alter how you annotate there. Alternatively, keeping what you're doing and grab the official gene symbols for each of the missing values using the method described in this other answer, then find the Entrez ID with your current method.

ADD COMMENT
0
Entering edit mode

My starting point is the gene ontology database. I get the genes annotated to a GO term from the database by an SQL query.like this And that query doesn't return an entrez id but rather a gene symbol. The GO database also gives an accession id but that accesion id can be from one of many different databases ( uniprot,swissprot,flybase etc.) therefore I decided it would be easier to work with the gene symbol provided by GO. It seems like there is no alternate route for gene annotation when I start with GO. As for your other suggestion; that seems logical. Thank you for the answer. I will update my comment after I try that.

ADD REPLY
0
Entering edit mode

UPDATE: none of the solutions in that link could help me. I tried the one which uses R and SQL could not get the official gene symbol for my list.Similarly the solution which uses a website to lookup gene symbols and aliases could not help me as the website also did not contain everything that I was looking for. I will try to find new ways to convert my gene symbols into the official gene symbols.

ADD REPLY
0
Entering edit mode

That's rough. Could also try this python package, though I kind of expect it will have the same issue. It wouldn't be a problem if they were consistent IDs, but going from Uniprot to all sorts of random other accessions is tough.

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6