Converting Gene Symbol to Ensembl ID in R
1
0
Entering edit mode
3.8 years ago
Harry_Wat • 0

When I converted my gene symbol list to Ensembl ID, I found that the relationship with gene symbol and Ensemble ID is not a one-to-one correspondence. For example, the original list has 798 genes. However, after I convert the gene symbol, there are 831 Ensembl IDs and corresponding gene symbols. Therefore, I do not why and would you explain this phenomenon? Thank you very much!

library("biomaRt") 
listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
Gene<-read.csv("HEK293_Baltz2012_4SURIC.csv",header=T)
options(max.print=1000000)
getBM(attributes=c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name', values = Gene$Gene_name, mart = ensembl)
gene R • 7.9k views
ADD COMMENT
0
Entering edit mode

library("biomaRt") listMarts() ensembl <- useMart("ensembl") datasets <- listDatasets(ensembl) ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) Gene<-read.csv("HEK293_Baltz2012_4SURIC.csv",header=T) options(max.print=1000000) getBM(attributes=c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name', values = Gene$Gene_name, mart = ensembl)

ADD REPLY
2
Entering edit mode
3.8 years ago

The fundamental reason for the mapping of one HGNC gene symbol to many Ensembl genes is the mismatch between gene definitions in HGNC and Ensembl. Gene definition in Ensembl is locus-based because it is associated with annotation of a reference genome whereas HGNC has this definition:

A gene is defined as: "a DNA segment that contributes to phenotype/function. In the absence of demonstrated function a gene may be characterized by sequence, transcription or homology".

Note that the HGNC definition can apply to multiple loci in the genome hence to multiple Ensembl genes.

The one-to-many mapping can also occasionally be caused by symbol ambiguity where a symbol has historically been used for multiple different genes and still shows up as alias for these genes.

ADD COMMENT
0
Entering edit mode

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6