Question

Map gene name to gene ID

2

Entering edit mode

9.8 years ago

yliueagle ▴ 290

I have a file containing a list of interested gene (human) names, for example, PRKAR1A, PRKCB (some are aliases). How can I get the ID of these genes? So that they are uniquely identified? I am using http://www.ncbi.nlm.nih.gov/gene/?term=PRKAR1A

I am new to biology. Thanks

gene • 11k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.8 years ago by yliueagle ▴ 290

Ram · Answer 1 · 2014-07-15

4

Entering edit mode

9.8 years ago

poisonAlien ★ 3.2k

Assuming you are looking for Entrez ID: You can use org.Hs.eg.db Bioconductor (R) package to extract any information.

>library("org.Hs.eg.db")
>gene=c("PRKAR1A", "PRKCB")
>unlist(mget(x=gene,envir=org.Hs.egALIAS2EG))
PRKAR1A   PRKCB 
 "5573"  "5579"

If you are not comfortable with R, you can use ensembl Biomart.

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

Many thanks! I tried and it works.

But how can I deal with the problem with muti-ID? For example, when I query 'NAT1', it returns 4 IDs "9" "1982" "6530" "10991".

And whether the name of a gene is case sensitive? When I try 'naT1' instead of 'NAT1', no id can be found.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by yliueagle ▴ 290

2

Entering edit mode

Regarding gene names - human genes are always represented in uppercase (except open reading frames like this one) and mouse genes are alwasy represented with Camel case (for ex. see BRCA1 for different species and how gene names are represented). So if you are sure that your list is from human may be you should convert them to upper case (toupper() in R) beforehand.

ADD REPLY • link 9.8 years ago by poisonAlien ★ 3.2k

0

Entering edit mode

You can map the IDs back, and choose the one paired, for example, choose the ID paired with NAT1 but not others.

ADD REPLY • link 9.8 years ago by juncheng ▴ 220

0

Entering edit mode

To some extend your multi-ID problem just is real as far as annotation goes. From your example NAT1 really is "9" but "1982" which you found too is EIF4G2 which is annotated as "also known as NAT1". So it depends on your question whether you want to use just the main one (which you could indeed find by linking back) or all.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Chris Evelo 10k

Ram · Answer 2 · 2014-07-15

0

Entering edit mode

9.8 years ago

Jonathan Crowther ▴ 210

You could try here, alternatively here.

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Jonathan Crowther ▴ 210

Ram · Answer 3 · 2014-07-15

Using Biomart as mentioned by @poisonAlien probably is your best bet.

That said, you could also use BridgeDb. If I understand you correctly, essentially what you want to do is to map two different database IDs for human genes where the first is the HGNC name (used by NCBI as the gene name) and the other is the NCBI gene ID.

You could do that with a BridgeDb webservice call like:

http://webservice.bridgedb.org/Human/xrefs/H/PRKAR1A?dataSource=L

It asks for the ID in dataSource L (Entrez gene) for a human gene name from HGNC (identified as H).

Note that you would normally want to install the BridgeDb webservice locally, this only serves as an example.

Further information at http://webservice.bridgedb.org (also for a link to dataSource codes) and in this paper.

Ram · Answer 4 · 2014-07-15

0

Entering edit mode

9.8 years ago

juncheng ▴ 220

I also recommend "org.Hs.eg.db" in R.

Otherwise you can try http://biodbnet.abcc.ncifcrf.gov/db/db2db.php

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.8 years ago by juncheng ▴ 220