Map gene name to gene ID
4
2
Entering edit mode
8.0 years ago
yliueagle ▴ 270

I have a file containing a list of interested gene (human) names, for example, PRKAR1A, PRKCB (some are aliases). How can I get the ID of these genes? So that they are uniquely identified? I am using http://www.ncbi.nlm.nih.gov/gene/?term=PRKAR1A

I am new to biology. Thanks

gene • 10k views
4
Entering edit mode
8.0 years ago
poisonAlien ★ 3.1k

Assuming you are looking for Entrez ID: You can use org.Hs.eg.db Bioconductor (R) package to extract any information.

>library("org.Hs.eg.db")
>gene=c("PRKAR1A", "PRKCB")
>unlist(mget(x=gene,envir=org.Hs.egALIAS2EG))
PRKAR1A   PRKCB
"5573"  "5579"


If you are not comfortable with R, you can use ensembl Biomart.

0
Entering edit mode

Many thanks! I tried and it works.

But how can I deal with the problem with muti-ID? For example, when I query 'NAT1', it returns 4 IDs "9" "1982" "6530" "10991".

And whether the name of a gene is case sensitive? When I try 'naT1' instead of 'NAT1', no id can be found.

2
Entering edit mode

Regarding gene names - human genes are always represented in uppercase (except open reading frames like this one) and mouse genes are alwasy represented with Camel case (for ex. see BRCA1 for different species and how gene names are represented). So if you are sure that your list is from human may be you should convert them to upper case (toupper() in R) beforehand.

0
Entering edit mode

You can map the IDs back, and choose the one paired, for example, choose the ID paired with NAT1 but not others.

0
Entering edit mode

To some extend your multi-ID problem just is real as far as annotation goes. From your example NAT1 really is "9" but "1982" which you found too is EIF4G2 which is annotated as "also known as NAT1". So it depends on your question whether you want to use just the main one (which you could indeed find by linking back) or all.

0
Entering edit mode
8.0 years ago

You could try here, alternatively here.

0
Entering edit mode
8.0 years ago

Using Biomart as mentioned by @poisonAlien probably is your best bet.

That said, you could also use BridgeDb. If I understand you correctly, essentially what you want to do is to map two different database IDs for human genes where the first is the HGNC name (used by NCBI as the gene name) and the other is the NCBI gene ID.

You could do that with a BridgeDb webservice call like:

http://webservice.bridgedb.org/Human/xrefs/H/PRKAR1A?dataSource=L

It asks for the ID in dataSource L (Entrez gene) for a human gene name from HGNC (identified as H).

Note that you would normally want to install the BridgeDb webservice locally, this only serves as an example.

Further information at http://webservice.bridgedb.org (also for a link to dataSource codes) and in this paper.

0
Entering edit mode
8.0 years ago
juncheng ▴ 200

I also recommend "org.Hs.eg.db" in R.

Otherwise you can try http://biodbnet.abcc.ncifcrf.gov/db/db2db.php