Question: biomaRt getBM function report some NA
1
gravatar for Bastien Hervé
7 months ago by
Bastien Hervé4.4k
Limoges, CBRS, France
Bastien Hervé4.4k wrote:

I'm trying to get some entrez ids from some gene names using biomaRt

In my example below I have 2 genes Igha and Mlc1 for mus musculus

My version of biomaRt is biomaRt_2.38.0

library("biomaRt")
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
getBM(attributes=c("ensembl_gene_id","mgi_symbol","entrezgene"), values = c("Igha", "Mlc1"), bmHeader = T, filters = "mgi_symbol", mart = mart)
#      Gene stable ID MGI symbol NCBI gene ID
#1 ENSMUSG00000095079       Igha           NA
#2 ENSMUSG00000035805       Mlc1       170790

I know that some Ensembl gene ID do not have Entrez gene ID ENSEMBL IDs 2 Entrez Gene IDs - what to do if no match?

But in this case, I'm able to find the Entrez gene ID from Ensembl

http://www.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000095079;r=12:113254830-113260236

Which is Refseq Gene ID 238447

Maybe Emily_Ensembl will have an answer :)

biomart R • 407 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Bastien Hervé4.4k
2
gravatar for Emily_Ensembl
7 months ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

The gene listed on the gene page is not actually a proper external reference and doesn't come through BioMart. That comes through a different pipeline and it's not reliable and we're getting rid of it.

ADD COMMENTlink written 7 months ago by Emily_Ensembl19k

So... No enrichment analysis for immunoglobulin genes in mus musculus ? I use enrichPathway which needs entrez gene ids.

ADD REPLYlink written 7 months ago by Bastien Hervé4.4k
1

Use Entrezdirect.

$ esearch -db gene -query "Mlc1 [gene] and 10090 [taxID]" | efetch 

1. Mlc1
Official Symbol: Mlc1 and Name: megalencephalic leukoencephalopathy with subcortical cysts 1 homolog (human) [Mus musculus (house mouse)]
Other Aliases: AW048630, BB074274, Kiaa0027-hp, LVM, MLC, VL, WKL1
Other Designations: membrane protein MLC1
Chromosome: 15; Location: 15 E3
Annotation: Chromosome 15 NC_000081.6 (88955884..88982691, complement)
ID: **170790**

$ esearch -db gene -query "Igha [gene] and 10090 [taxID]" | efetch 

1. Igha
Official Symbol: Igha and Name: immunoglobulin heavy constant alpha [Mus musculus (house mouse)]
Other Aliases: IgA, Igh-2
Other Designations: immunoglobulin heavy chain 2 (serum IgA)
Chromosome: 12; Location: 12 62.09 cM
Annotation: Chromosome 12 NC_000078.6 (113256204..113260236, complement)
ID: **238447**
ADD REPLYlink modified 7 months ago • written 7 months ago by genomax73k
1
gravatar for Bastien Hervé
7 months ago by
Bastien Hervé4.4k
Limoges, CBRS, France
Bastien Hervé4.4k wrote:

Using genomax 's comment

Here is my trick using esearch

biomart_genes <- getBM(attributes = c("external_gene_name", "entrezgene"), filters = "mgi_symbol", values = gene_list, bmHeader = T, mart = mart)
colnames(biomart_genes) <- c("hgnc_symbol","entrez_gene_id")

for (i in 1:nrow(biomart_genes)){
    if is.na(biomart_genes[i,"entrez_gene_id"])){
        term <- paste0(biomart_genes[i,"hgnc_symbol"]," [gene] and 10090 [taxID]")
        biomart_genes[i,"entrez_gene_id"] <- uid(esearch(term, db = "gene", rettype = "uilist", retmode = "xml", retmax = 1))
    }
}
ADD COMMENTlink modified 7 months ago • written 7 months ago by Bastien Hervé4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1698 users visited in the last hour