I have a file containing a list of interested gene (human) names, for example, PRKAR1A, PRKCB (some are aliases). How can I get the ID of these genes? So that they are uniquely identified? I am using http://www.ncbi.nlm.nih.gov/gene/?term=PRKAR1A
Regarding gene names - human genes are always represented in uppercase (except open reading frames like this one) and mouse genes are alwasy represented with Camel case (for ex. see BRCA1 for different species and how gene names are represented). So if you are sure that your list is from human may be you should convert them to upper case (toupper() in R) beforehand.
To some extend your multi-ID problem just is real as far as annotation goes. From your example NAT1 really is "9" but "1982" which you found too is EIF4G2 which is annotated as "also known as NAT1". So it depends on your question whether you want to use just the main one (which you could indeed find by linking back) or all.
Using Biomart as mentioned by @poisonAlien probably is your best bet.
That said, you could also use BridgeDb. If I understand you correctly, essentially what you want to do is to map two different database IDs for human genes where the first is the HGNC name (used by NCBI as the gene name) and the other is the NCBI gene ID.
You could do that with a BridgeDb webservice call like:
Many thanks! I tried and it works.
But how can I deal with the problem with muti-ID? For example, when I query 'NAT1', it returns 4 IDs "9" "1982" "6530" "10991".
And whether the name of a gene is case sensitive? When I try 'naT1' instead of 'NAT1', no id can be found.
Regarding gene names - human genes are always represented in uppercase (except open reading frames like this one) and mouse genes are alwasy represented with Camel case (for ex. see BRCA1 for different species and how gene names are represented). So if you are sure that your list is from human may be you should convert them to upper case (
toupper()
in R) beforehand.You can map the IDs back, and choose the one paired, for example, choose the ID paired with NAT1 but not others.
To some extend your multi-ID problem just is real as far as annotation goes. From your example NAT1 really is "9" but "1982" which you found too is EIF4G2 which is annotated as "also known as NAT1". So it depends on your question whether you want to use just the main one (which you could indeed find by linking back) or all.