Missing ensembl_ids in biomaRt uniprot query
1
0
Entering edit mode
7.5 years ago
A. Domingues ★ 2.6k

[This question has been cross-posted in the bioconductor forum. I am reposting due to little feedback]

I am trying to do a "classical" match of uniprot ids, using protein IDs identified in a Zebrafish mass-spec experiment, to find the corresponding ensembl gene ids. However, there are several proteins for which my biomaRt query fails to retrieve any information, although they are present in the Uniprot database and with an attributed ensembl gene id. Some uniprot ids (e.g. F1QCB4) belong to deleted entries in Uniprot, but this is not the case for all.

Am I missing something?

Here is my code:

prot_ids = c("F1QCB4", "F1R8H7", "A0JMF6", "F1QU18", "A0JMK7", "A0MTA1")

uniProt <- useMart("unimart", dataset="uniprot")

getBM(
        attributes =c("accession" ,"name","ensembl_id", "gene_name"),
        filter="accession",
        values=prot_ids,
        mart=uniProt)

[1] accession  name       ensembl_id gene_name
<0 rows> (or 0-length row.names)
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] biomaRt_2.22.0     VennDiagram_1.6.9  RColorBrewer_1.1-2

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.28.1 Biobase_2.26.0       BiocGenerics_0.12.1
 [4] bitops_1.0-6         DBI_0.3.1            GenomeInfoDb_1.2.3
 [7] IRanges_2.0.0        parallel_3.1.2       RCurl_1.95-4.5
[10] RSQLite_1.0.0        S4Vectors_0.4.0      stats4_3.1.2
[13] tcltk_3.1.2          tools_3.1.2          XML_3.98-1.1
biomaRt bioconductor uniprot ensembl • 2.8k views
ADD COMMENT
2
Entering edit mode
7.5 years ago
Neilfws 49k

Of the values that you list in prot_ids only 3 have not been deleted and are still valid:

A0JMF6, A0JMK7, A0MTA1

If you search at the Ensembl website you'll see that only one of those, A0JMK7, maps to an Ensembl Gene ID.

You can confirm this by using Ensembl as your Mart and using uniprot_sptrembl as the filter & attribute, because this is a TrEMBL ID:

mart.dr <- useMart("ensembl", "drerio_gene_ensembl")
getBM(attributes = c("uniprot_genename", "uniprot_sptrembl", "ensembl_gene_id"), filters = "uniprot_sptrembl", values = prot_ids, mart = mart.dr)

#   uniprot_genename uniprot_sptrembl    ensembl_gene_id
# 1             CD99           A0JMK7 ENSDARG00000051975
ADD COMMENT
0
Entering edit mode

Thank you. It is interesting that connecting to either uniprot, as I did, or to ensembl, in your example, has different results, ensembl being seemingly better than that of uniprot.

ADD REPLY

Login before adding your answer.

Traffic: 1932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6