Biomart doesn't output ALL uniprot IDs
1
0
Entering edit mode
8 months ago
sgupt46 • 0

Hi, I am trying to use biomart to get uniprot IDs, but I only get a partial list.

library(biomaRt)
ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")
protein_names <- biomaRt::getBM(attributes = c("uniprotswissprot"), mart = ensembl)
any(protein_names$uniprotswissprot == 'O60397')
FALSE

However, when I look at the Uniprot website, I do see this ID https://www.uniprot.org/uniprotkb/O60397 I am wondering why do I not get these IDs from BioMart?

biomart Uniprot • 725 views
ADD COMMENT
1
Entering edit mode
8 months ago

Your Biomart query is queriying the hsapiens_gene_ensembl database, which is gene (i.e. DNA) centric. For each your query, Biomart will go through each gene /transcript in the Ensembl database, and return the linked uniprot id.

However, not all Uniprot proteins are linked to Ensembl transcripts. So a Uniprot entry that isn't recorded as the product of an Ensembl transcript will not be returned. You'll notice that if you look at the example record you post, the genomic coordinates link is greyed out. If you trace back the sequence of this Uniprot record, you'll find it was generated by translation of predicted open reading frames in this EMBL record: https://www.ebi.ac.uk/ena/browser/api/embl/AC004544.1

Which was deposited in GenBank prior to the sequencing of the human genome, and this exact sequence is not found anywhere in the completed human genome, and so has never had any Ensembl transcripts associated with it.

ADD COMMENT

Login before adding your answer.

Traffic: 3917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6