Question: biomaRt not finding all genes
0
gravatar for danielcgingerich
11 weeks ago by
danielcgingerich0 wrote:

I need to align a dataset mapped to GRCh38.p2 (ensembl 79) and a dataset mapped to GRCh38.p13 (ensembl 98). The first dataset (ensembl 79) has gene names and entrez IDs. The second dataset (ensembl 98) has gene names and ENSG IDs. I want to convert ensembl 79 entrez IDs to ENSG IDs. When I query on biomaRt, almost half of the genes are not found. I have tried using both "external_gene_name" and "enterezgene" as filters. I have tried using both the most recent mart and archived marts (ensembl 77-80).

FYI: approximately 25000 genes were not found, and of these genes about 10000 of them are pseudogenes.

Code below:

listEnsemblArchives()
biomart <- useMart("ensembl", host = "https://oct2014.archive.ensembl.org", dataset = "hsapiens_gene_ensembl")
filters <-listFilters(biomart)
attributes <- listAttributes(biomart)

m1.biomart <- getBM(filters = "entrezgene", attributes = c("ensembl_gene_id","entrezgene", "external_gene_name", "hgnc_symbol"), values = m1.entrez.ids$entrez_id,  mart = biomart)

length(unique(m1.entrez.ids$entrez_id))
[1] 50281

length(unique(m1.biomart$entrezgene))
[1] 25987

length(unique(m1.biomart$ensembl_gene_id))
[1] 28701
ADD COMMENTlink written 11 weeks ago by danielcgingerich0

It would be helpful if you can give some examples of Entrez IDs that are missing in the BioMart response.

ADD REPLYlink written 11 weeks ago by Mike Smith1.6k

I have a problem similar to danielcgingerich but for finding Entrez IDs from Ensembl IDs, in that my code is not giving me Entrez IDs for genes that do have them (when searched via https://www.ncbi.nlm.nih.gov/gene/); I'm not sure if this warrants its own post, so I ask it here.

In the example below, ENSMUSG00000000031 does not appear to have an Entrez ID, but searching through NCBI shows that it refers to H19.

library(biomaRt)
c <- ('ENSMUSG00000000001', 'ENSMUSG00000000003', 'ENSMUSG00000000028', 'ENSMUSG00000000031', 'ENSMUSG00000000037') # this is just a sample
mmusmart <- useMart(dataset = "mmusculus_gene_ensembl", biomart = "ensembl")
mapping <- getBM(
   attributes = c('ensembl_gene_id', 'entrezgene_id', 'entrezgene_accession'), 
   filters = 'ensembl_gene_id',
   values = ensemblIDs,
   mart = mmusmart)

Output of head(mapping):

ensembl_gene_id entrezgene_id   entrezgene_accession
<chr>   <int>   <chr>
1   ENSMUSG00000000001  14679   Gnai3
2   ENSMUSG00000000003  54192   Pbsn
3   ENSMUSG00000000028  12544   Cdc45
4   ENSMUSG00000000031  NA  
5   ENSMUSG00000000037  107815  Scml2
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by AndRewster0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2686 users visited in the last hour
_