Question: Quickest Way To Convert/Update Gene Ids In A Table
1
gravatar for enricoferrero
7.3 years ago by
enricoferrero800
United Kingdom
enricoferrero800 wrote:

Hi,

I have a number of tab delimited files containing various types of information about specific genes. One or more of the columns can be Aliases to Gene Symbols that I need to upgrade to the latest Gene Symbol annotation.

I'm using Bioconductor's org.Hs.eg.db library to do so (the org.Hs.egALIAS2EG and org.Hs.egSYMBOL objects in particular).

The code reported does the job but is very slow, I guess because of the nested for loops that query the org.Hs.eg.db database at each iteration. Is there a quicker/simpler/smarter way to achieve the same result?

library(org.Hs.eg.db)

myTable <- read.table("tab_delimited_file.txt", header=TRUE, sep="\t", as.is=TRUE)

for (i in 1:nrow(myTable)) {
    for (j in 1:ncol(myTable)) {
        repl <- org.Hs.egALIAS2EG[[myTable[i,j]]][1]
        if (!is.null(repl)) {
            repl <- org.Hs.egSYMBOL[[repl]][1]
            if (!is.null(repl)) {
                myTable[i,j] <- repl
            }
        }
    }
}

write.table(myTable, file="new_tab_delimited_file", quote=FALSE, sep="\t", row.names=FALSE, col.names=TRUE)

I'm thinking to use one of the apply function, but bear in mind that org.Hs.egALIAS2EG and org.Hs.egSYMBOL are objects, and not functions.

Thank you!

R bioconductor • 2.1k views
ADD COMMENTlink modified 7.3 years ago by SimonD0 • written 7.3 years ago by enricoferrero800

For these thing awk is very convenient and extremely fast. But you need files for that which have info about each gene against which you have to parse your file. Otherwsie try DAVID (there are many other online gene id conversion tools)

ADD REPLYlink written 7.3 years ago by Bharat Iyengar300

Thanks, but I was looking into a faster way to do the same in R. One of the advantages of using Bioconductr is that I don't have to worry about keeping mappings between IDs up to date locally.

ADD REPLYlink written 7.3 years ago by enricoferrero800
0
gravatar for SimonD
7.3 years ago by
SimonD0
SimonD0 wrote:

I think this might work.

    repl <- org.Hs.egALIAS2EG[[ myTable[1:nrow(myTable),1:ncol(myTable)] ]][1]
    myNewTable <- matrix(nrow=nrow(myTable), repl, byrow=T)
ADD COMMENTlink written 7.3 years ago by SimonD0

Thanks, that's a very cool yet simple approach. Unfortunately it fails (possibly because of gene symbols mapped to more than one Entrez Gene ID?):

> repl <- org.Hs.egALIAS2EG[[ myTable[1:nrow(myTable),1:ncol(myTable)] ]][1]
Error in .doubleBracketSub(x, i, j, ...) : 
attempt to select more than one element

Any idea on how to fix it?

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by enricoferrero800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1661 users visited in the last hour