Help with converting gene symbols to gene IDs
1
0
Entering edit mode
24 months ago
nattzy94 ▴ 30

I am trying to convert gene symbols in one column to gene IDs. I am using the following code to achieve this:

for (i in 1:nrow(data_gsea2)){
data_gsea2$ID[i] <- mapIds(org.Hs.eg.db, (data_gsea2$ID[i]), "ENTREZID", "SYMBOL")
}


I have a dataframe of 7386 rows/genes and this is taking forever to complete. I'm sure there is a smarter way to do this but I'm not sure how to. Anyone can help?

Thanks very much!

R gsea • 808 views
1
Entering edit mode

If you've verified this command works as you expect it to (by, say, running it on ten rows instead of the entire data.frame), you'll probably just have to wait for the process to complete.

0
Entering edit mode

Yes, it works but takes a long time.

0
Entering edit mode

Try using apply (or even better, mclapply) instead of using a loop.

1
Entering edit mode
24 months ago
Ahill ★ 1.9k

mapIds manual indicates it allows submission of multiple keys, as opposed to one-by-one, and using a different test case that is on hand here, that appears to be much faster:

require(AnnotationDbi)
require(hgu95av2.db)

# sapply - one key at a time, 100 mapIds() calls
system.time(sapply(keys, function(z) mapIds(hgu95av2.db, keys=z, column="ALIAS", keytype="ENTREZID")))
<snip>
user  system elapsed
5.35    0.64    6.11

# send all keys at once, 1 mapIds() call
system.time(mapIds(hgu95av2.db, keys, column="ALIAS", keytype="ENTREZID"))
<snip>
user  system elapsed
0.05    0.01    0.06

0
Entering edit mode

Good catch, Ahill! Can't believe I missed this - OP is submitting keys one by one in a loop instead of submitting a vector of keys!