Question: Converting Gene Symbol to Ensembl ID in R
0
gravatar for Harry_Wat
5 weeks ago by
Harry_Wat0
Harry_Wat0 wrote:

When I converted my gene symbol list to Ensembl ID, I found that the relationship with gene symbol and Ensemble ID is not a one-to-one correspondence. For example, the original list has 798 genes. However, after I convert the gene symbol, there are 831 Ensembl IDs and corresponding gene symbols. Therefore, I do not why and would you explain this phenomenon? Thank you very much!

library("biomaRt") 
listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
Gene<-read.csv("HEK293_Baltz2012_4SURIC.csv",header=T)
options(max.print=1000000)
getBM(attributes=c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name', values = Gene$Gene_name, mart = ensembl)
R gene • 157 views
ADD COMMENTlink modified 5 weeks ago by Jean-Karim Heriche23k • written 5 weeks ago by Harry_Wat0

library("biomaRt") listMarts() ensembl <- useMart("ensembl") datasets <- listDatasets(ensembl) ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) Gene<-read.csv("HEK293_Baltz2012_4SURIC.csv",header=T) options(max.print=1000000) getBM(attributes=c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name', values = Gene$Gene_name, mart = ensembl)

ADD REPLYlink written 5 weeks ago by Harry_Wat0
2
gravatar for Jean-Karim Heriche
5 weeks ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

The fundamental reason for the mapping of one HGNC gene symbol to many Ensembl genes is the mismatch between gene definitions in HGNC and Ensembl. Gene definition in Ensembl is locus-based because it is associated with annotation of a reference genome whereas HGNC has this definition:

A gene is defined as: "a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology".

Note that the HGNC definition can apply to multiple loci in the genome hence to multiple Ensembl genes.

The one-to-many mapping can also occasionally be caused by symbol ambiguity where a symbol has historically been used for multiple different genes and still shows up as alias for these genes.

ADD COMMENTlink written 5 weeks ago by Jean-Karim Heriche23k

Thank you very much!

ADD REPLYlink written 4 weeks ago by Harry_Wat0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1599 users visited in the last hour