Question: How To Convert List Of Entrez Ids Into Gene Name
2
gravatar for grosy
6.5 years ago by
grosy70
grosy70 wrote:

Hi Friends,

I have list of 10,000 Entrez IDs and i want to convert the multiple Entrez IDs into the respective gene names. Could someone suggest me the way to do this?
In a Bioconductor package called "Biomart", we can do this for individual gene. Like

> library(org.Hs.eg.db)
> library(annotate)
> lookUp('3815', 'org.Hs.eg', 'SYMBOL') 
   $`3815` 
   [1] "KIT"
> lookUp('3815', 'org.Hs.eg', 'REFSEQ') 
   $`3815`
   [1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"

This answer i got it from SEQanswer, but then is there any way to do this for multiple Entrez IDs?

Thanks in advance..

ADD COMMENTlink modified 6.5 years ago by Jordan1.1k • written 6.5 years ago by grosy70
2

I think this may be one of the easiest way to do this task. You can convert Entrez ID into gene name by using website called "MatchMiner" (http://discover.nci.nih.gov/matchminer/MatchMinerLookup.jsp). All you need to do is to upload a file that contains all your Entrez IDs. This website will convert them into HUGO gene names.

ADD REPLYlink written 6.5 years ago by hojoon.compbio20

Thanks @hojoon.compbio it worked... :o)

ADD REPLYlink written 6.5 years ago by grosy70

What is the library "annotate" and how can I install it, please?

Thanks.

ADD REPLYlink written 3.0 years ago by moxu440
2

It's a Bioconductor package; details and installation instructions are here:

http://bioconductor.org/packages/release/bioc/html/annotate.html

ADD REPLYlink written 3.0 years ago by Neilfws48k

Great! Which function converts gene symbols to entrez gene ids, please?

Thanks.

ADD REPLYlink written 3.0 years ago by moxu440

Time for you to read some documentation I think :)

ADD REPLYlink written 3.0 years ago by Neilfws48k

Thanks for your question, this what I need

ADD REPLYlink modified 12 months ago • written 12 months ago by sara_wasl0
8
gravatar for David W
6.5 years ago by
David W4.7k
New Zealand
David W4.7k wrote:

This is an easy one - just pass a character vector that has more than one value:

getSYMBOL(c('3815', '3816', '2341'), data='org.Hs.eg')
    3815     3816     2341 
   "KIT"   "KLK1" "FNTAP2"
ADD COMMENTlink written 6.5 years ago by David W4.7k

Yeah Thanks a lot :) but it doesn't work more than some 100 gene IDs... so all i have to do now is to

a <- read.csv("entrez ids.csv", header = TRUE)

library(org.Hs.eg.db)

library(annotate)

d= getSYMBOL(a, data='org.Hs.eg') Error in .checkKeysAreWellFormed(keys) : keys must be supplied in a character vector with no NAs

This is the error i get....

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by grosy70
1

When you read data into an R session with read.csv you get a dataframe containing rows and columns. In this case you probably have all your ids in one column which you can specify with $. Something like a$EntrezIDs. If you are new to R you should probably read some intro tutorials

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by David W4.7k
1

I don't think the issue is number of IDs. I've retrieved tens of thousands of attributes (slowly) in one go using biomaRt.

ADD REPLYlink written 6.5 years ago by Neilfws48k

In a loop, can you pass in a vector of 100 elements at a time? (Or perhaps you need to filter out bad/NA entries?)

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Alex Reynolds29k

Actually i think the problem could be solved if i take the CSV file and list it in a variable... Like given in the Bioconducter package

"http://stuff.mit.edu/afs/athena/software/r_v2.14.1/lib/R/library/org.Hs.eg.db/html/org.Hs.egSYMBOL.html"

But the only problem i am facing now is to list the each value from the CSV file

ADD REPLYlink written 6.5 years ago by grosy70
d= getSYMBOL(na.omit(a), data='org.Hs.eg')
ADD REPLYlink written 2.6 years ago by Adam20

Your answer help me a lot, thanks +1

ADD REPLYlink written 12 months ago by sara_wasl0
5
gravatar for David
6.5 years ago by
David730
David730 wrote:

You have geneIDs that are NA.

use mget with ifnotfound=NA

a <- read.csv("entrez ids.csv", header = TRUE)
a.symbol <- as.vector(unlist(mget(a, envir=org.Hs.egSYMBOL, ifnotfound=NA)))
ADD COMMENTlink written 6.5 years ago by David730

I am sorry i did but still it shows the same problem

a <- read.csv("C:\Users\Desktop\entrez ids _row.csv", header = TRUE) a.symbol <- as.vector(unlist(mget(a, envir=org.Hs.egSYMBOL, ifnotfound=NA)))

Error in .checkKeysAreWellFormed(keys) : keys must be supplied in a character vector with no NAs

ADD REPLYlink written 6.5 years ago by grosy70

R tells you what is wrong: "keys must be supplied in a character vector with no NAs"

Do that after read.csv

a <- a[-is.na(a)]
ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by David730
1
gravatar for Jordan
6.5 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

Another way to do without coding is to use ID Mapping in Uniprot. You can just upload a list of entrez id's and then map it.

ADD COMMENTlink written 6.5 years ago by Jordan1.1k

ya i tried this... But i Want is from ENTREZ ID to GENE NAME... Could you suggest me the options to be choosen to convert From Entrez ID to GENE NAME?

ADD REPLYlink written 6.5 years ago by grosy70

One silly way of doing it is, mapping it to uniprot id's and then to your required Gene names. But I think you already got the answer. I usually download the ID mapping file from uniprot and write my own code for mapping in perl.

ADD REPLYlink written 6.5 years ago by Jordan1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1611 users visited in the last hour