Gene ID mapping of Ensembl IDs
1
2
Entering edit mode
5.4 years ago

I have a list of Ids that appear are Ensembl transcript IDs; I want to map these ids to gene names, but when I use the biomart view on ensembl it only gives me the transcript IDs without gene name. The other issue is that the data I have has decimal points in the ensembl IDs, whereas when downloading IDs from ensembl using martview no IDs with decimal points are given.

Example of ID's I have

ENST00000576171.1   ENSG00000273172.1
ENST00000338094.6   ENSG00000273173.1
ENST00000338327.4   ENSG00000273173.1
ENST00000577949.1   ENSG00000273173.1
ENST00000580062.1   ENSG00000273173.1
RNA-Seq gene ID ensembl • 6.3k views
2
Entering edit mode
5.4 years ago

You likely just need to change the attributes that you want to get from biomart. Here's an example query with your example values that returns the gene symbol and name.

0
Entering edit mode

I figured that out after some digging, but thank you for the insight. The other issue is with the decimal points in the IDs. I see these in ensemble but the biomart browser does not appear to be able to provide ensemble IDs with the decimal point. If I am just trying to insert gene names would it be best to remove the decimal points and assign the IDs using that list?

1
Entering edit mode

I think the decimal parts of the ENSG's can be dropped. I have done so in the past and have still been able to convert. One thing to note is that some ENSG's (depending on which reference you are using) are old and have been retired. As such they wont map to anything.

Here is some R code I often use (not mine originally can't remember where I found it)

convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
stopifnot( inherits( db, "AnnotationDb" ) )
ifMultiple <- match.arg( ifMultiple )
suppressWarnings( selRes <- AnnotationDbi::select(
db, keys=ids, keytype=from, columns=c(from,to) ) )

if ( ifMultiple == "putNA" ) {
duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
}

return( selRes[ match( ids, selRes[,1] ), 2 ] )
}

It requires library(org.Hs.eg.db) to work for human genes. Here is an example call:

results$entrez <- convertIDs(my_list_of_ensgs, "ENSEMBL", "ENTREZID", org.Hs.eg.db) results$symbol <- convertIDs(my_list_of_ensgs, "ENSEMBL", "SYMBOL", org.Hs.eg.db)

Do note that it will return NAs for gene with no mapping (to maintain the size of the list). You can either filter or manually look these up.