Question: Gene ID mapping of Ensembl IDs
2
gravatar for onspotproductions
4.7 years ago by
United States
onspotproductions140 wrote:

I have a list of Ids that appear are Ensembl transcript IDs; I want to map these ids to gene names, but when I use the biomart view on ensembl it only gives me the transcript IDs without gene name. The other issue is that the data I have has decimal points in the ensembl IDs, whereas when downloading IDs from ensembl using martview no IDs with decimal points are given.

Example of ID's I have

ENST00000576171.1   ENSG00000273172.1
ENST00000338094.6   ENSG00000273173.1
ENST00000338327.4   ENSG00000273173.1
ENST00000577949.1   ENSG00000273173.1
ENST00000580062.1   ENSG00000273173.1
rna-seq id ensembl gene • 5.7k views
ADD COMMENTlink modified 2.1 years ago by RamRS30k • written 4.7 years ago by onspotproductions140
2
gravatar for Devon Ryan
4.7 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

You likely just need to change the attributes that you want to get from biomart. Here's an example query with your example values that returns the gene symbol and name.

ADD COMMENTlink modified 2.1 years ago by RamRS30k • written 4.7 years ago by Devon Ryan97k

I figured that out after some digging, but thank you for the insight. The other issue is with the decimal points in the IDs. I see these in ensemble but the biomart browser does not appear to be able to provide ensemble IDs with the decimal point. If I am just trying to insert gene names would it be best to remove the decimal points and assign the IDs using that list?

ADD REPLYlink modified 2.1 years ago by RamRS30k • written 4.7 years ago by onspotproductions140
1

I think the decimal parts of the ENSG's can be dropped. I have done so in the past and have still been able to convert. One thing to note is that some ENSG's (depending on which reference you are using) are old and have been retired. As such they wont map to anything.

Here is some R code I often use (not mine originally can't remember where I found it)

convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
  stopifnot( inherits( db, "AnnotationDb" ) )
  ifMultiple <- match.arg( ifMultiple )
  suppressWarnings( selRes <- AnnotationDbi::select(
    db, keys=ids, keytype=from, columns=c(from,to) ) )

  if ( ifMultiple == "putNA" ) {
    duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
    selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
  }

  return( selRes[ match( ids, selRes[,1] ), 2 ] )
}

It requires library(org.Hs.eg.db) to work for human genes. Here is an example call:

results$entrez <- convertIDs(my_list_of_ensgs, "ENSEMBL", "ENTREZID", org.Hs.eg.db)
results$symbol <- convertIDs(my_list_of_ensgs, "ENSEMBL", "SYMBOL", org.Hs.eg.db)

Do note that it will return NAs for gene with no mapping (to maintain the size of the list). You can either filter or manually look these up.

ADD REPLYlink modified 2.1 years ago by RamRS30k • written 4.7 years ago by mbio.kyle360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1335 users visited in the last hour