Question: Converting Gene Symbol to Ensembl ID in R
0
gravatar for Harry_Wat
3 months ago by
Harry_Wat0
Harry_Wat0 wrote:

When I converted my gene symbol list to Ensembl ID, I found that the relationship with gene symbol and Ensemble ID is not a one-to-one correspondence. For example, the original list has 798 genes. However, after I convert the gene symbol, there are 831 Ensembl IDs and corresponding gene symbols. Therefore, I do not why and would you explain this phenomenon? Thank you very much!

library("biomaRt") 
listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
Gene<-read.csv("HEK293_Baltz2012_4SURIC.csv",header=T)
options(max.print=1000000)
getBM(attributes=c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name', values = Gene$Gene_name, mart = ensembl)
R gene • 491 views
ADD COMMENTlink modified 3 months ago by Jean-Karim Heriche23k • written 3 months ago by Harry_Wat0

library("biomaRt") listMarts() ensembl <- useMart("ensembl") datasets <- listDatasets(ensembl) ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) Gene<-read.csv("HEK293_Baltz2012_4SURIC.csv",header=T) options(max.print=1000000) getBM(attributes=c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name', values = Gene$Gene_name, mart = ensembl)

ADD REPLYlink written 3 months ago by Harry_Wat0
2
gravatar for Jean-Karim Heriche
3 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

The fundamental reason for the mapping of one HGNC gene symbol to many Ensembl genes is the mismatch between gene definitions in HGNC and Ensembl. Gene definition in Ensembl is locus-based because it is associated with annotation of a reference genome whereas HGNC has this definition:

A gene is defined as: "a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology".

Note that the HGNC definition can apply to multiple loci in the genome hence to multiple Ensembl genes.

The one-to-many mapping can also occasionally be caused by symbol ambiguity where a symbol has historically been used for multiple different genes and still shows up as alias for these genes.

ADD COMMENTlink written 3 months ago by Jean-Karim Heriche23k

Thank you very much!

ADD REPLYlink written 3 months ago by Harry_Wat0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1309 users visited in the last hour