Question

Gene ID mapping of Ensembl IDs

3

Entering edit mode

8.2 years ago

onspotproductions ▴ 150

I have a list of Ids that appear are Ensembl transcript IDs; I want to map these ids to gene names, but when I use the biomart view on ensembl it only gives me the transcript IDs without gene name. The other issue is that the data I have has decimal points in the ensembl IDs, whereas when downloading IDs from ensembl using martview no IDs with decimal points are given.

Example of ID's I have

ENST00000576171.1   ENSG00000273172.1
ENST00000338094.6   ENSG00000273173.1
ENST00000338327.4   ENSG00000273173.1
ENST00000577949.1   ENSG00000273173.1
ENST00000580062.1   ENSG00000273173.1

RNA-Seq gene ID ensembl • 8.5k views

ADD COMMENT • link updated 22 months ago by Laura Luebbert ▴ 450 • written 8.2 years ago by onspotproductions ▴ 150

score 4 · Answer 1 · 2022-05-30

4

Entering edit mode

23 months ago

Laura Luebbert ▴ 450

You can pass these IDs directly to gget info with or without the version number (the number behind the decimal). gget works from the command line or a Python environment, like JupyterLab.

pip install gget, then simply:

# Command-line
gget info ENST00000576171.1 ENSG00000273172.1

# Python
import gget
gget.info(["ENST00000576171.1", "ENSG00000273172.1"])

ADD COMMENT • link 22 months ago by Laura Luebbert ▴ 450

1

Entering edit mode

If you're gonna plug your tool on all the appropriate questions (which is fine), you might consider making a tool post to advertise it more broadly.

ADD REPLY • link 23 months ago by jared.andrews07 ★ 16k

1

Entering edit mode

I thought I might as well do both: Efficient querying of genomic reference databases with gget

ADD REPLY • link 22 months ago by Laura Luebbert ▴ 450

0

Entering edit mode

Oh, good look, I didn't see it originally.

ADD REPLY • link 22 months ago by jared.andrews07 ★ 16k

Ram · Answer 2 · 2016-02-24

2

Entering edit mode

8.2 years ago

Devon Ryan 104k

You likely just need to change the attributes that you want to get from biomart. Here's an example query with your example values that returns the gene symbol and name.

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 8.2 years ago by Devon Ryan 104k

0

Entering edit mode

I figured that out after some digging, but thank you for the insight. The other issue is with the decimal points in the IDs. I see these in ensemble but the biomart browser does not appear to be able to provide ensemble IDs with the decimal point. If I am just trying to insert gene names would it be best to remove the decimal points and assign the IDs using that list?

ADD REPLY • link updated 5.6 years ago by Ram 43k • written 8.2 years ago by onspotproductions ▴ 150

1

Entering edit mode

I think the decimal parts of the ENSG's can be dropped. I have done so in the past and have still been able to convert. One thing to note is that some ENSG's (depending on which reference you are using) are old and have been retired. As such they wont map to anything.

Here is some R code I often use (not mine originally can't remember where I found it)

convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
  stopifnot( inherits( db, "AnnotationDb" ) )
  ifMultiple <- match.arg( ifMultiple )
  suppressWarnings( selRes <- AnnotationDbi::select(
    db, keys=ids, keytype=from, columns=c(from,to) ) )

  if ( ifMultiple == "putNA" ) {
    duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
    selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
  }

  return( selRes[ match( ids, selRes[,1] ), 2 ] )
}

It requires library(org.Hs.eg.db) to work for human genes. Here is an example call:

results$entrez <- convertIDs(my_list_of_ensgs, "ENSEMBL", "ENTREZID", org.Hs.eg.db)
results$symbol <- convertIDs(my_list_of_ensgs, "ENSEMBL", "SYMBOL", org.Hs.eg.db)

Do note that it will return NAs for gene with no mapping (to maintain the size of the list). You can either filter or manually look these up.

ADD REPLY • link updated 5.6 years ago by Ram 43k • written 8.2 years ago by mbio.kyle ▴ 380