biomaRt doesn't report uniprot id
1
0
Entering edit mode
7 months ago
H.Hasani ▴ 990

Hi all,

I'm working with UniProtKB and would like to get the genomic coordinates of list of proteins. To my knowledge, the first step is to retrieve the mapping between uniprot and ensembl via biomaRt:

annotLookup <- getBM(
mart = mart,
attributes = c(
    'ensembl_gene_id',
    'uniprot_gn_symbol',
    'uniprot_gn_id', 
    'chromosome_name'),
uniqueRows=TRUE)

However, the uniprot id doesn't always exist! So, alternatively I used the gene name. Although that the protein was then found, it has, however, different uniprot id.

For control, I used ID mapping from uniprot to double check, both biomaRt and ID mapping agreed on the same ensembl id. This indeed indicates that here must be a correct mapping between these two.

My question obviously is, how can I map the uniprot id correctly to the ensembl id?

Thank you

R Uniprot biomaRt • 1.1k views
ADD COMMENT
0
Entering edit mode

Please provide examples when referring to any kind of IDs.

ADD REPLY
0
Entering edit mode

My question obviously is, how can I map the uniprot id correctly to the ensemble id?

How about: Using EBI protein API with uniprot isoforms

This should still work: How To Convert Uniprot Ids To Ensemble Gene/Transcript Ids

ADD REPLY
0
Entering edit mode

Thank you, I'm working with list of genes, but take as an example P19544

the method above, doesn't retrieve it either

curl -X GET --header 'Accept:application/json' 'https://www.ebi.ac.uk/proteins/api/coordinates/location/P19544:1-100'

{"requestedURL":"https://www.ebi.ac.uk/proteins/api/coordinates/location/P19544:1-100","errorMessage":["Can not find coordinates for accession: {P19544}; the accession may be obsolete, please check https://www.uniprot.org/ to verify it."]}
ADD REPLY
0
Entering edit mode

The following works:

https://www.ebi.ac.uk/proteins/api/genecentric/P19544

OR

curl -X GET --header 'Accept:application/xml' 'https://www.ebi.ac.uk/proteins/api/genecentric/P19544'

If you would rather have JSON then:

curl -X GET --header 'Accept:application/json' 'https://www.ebi.ac.uk/proteins/api/genecentric/P19544'
ADD REPLY
0
Entering edit mode

A couple of notes:

  1. It's biomaRt, not biomaRT
  2. It's ensembl as in ensEMBL, not the word ensemble.

I've fixed those for you but please be mindful of what important tools in the field are called.

ADD REPLY
1
Entering edit mode
7 months ago
Mike Smith ★ 2.0k

I wonder if the attribute you actually need is uniprotswissprot rather than uniprot_gn_id. Without some more examples of the IDs you're trying to convert all the answers are a bit of a guess, but that seems to work for P19544

library(biomaRt)
ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

protein_id <- "P19544"

getBM(
  attributes = c(
    'uniprotswissprot', 
    'ensembl_gene_id',
    'chromosome_name', 
    'start_position', 
    'end_position'
  ),
  filter = "uniprotswissprot",
  value = protein_id,
  mart = ensembl
)
#>   uniprotswissprot ensembl_gene_id chromosome_name start_position end_position
#> 1           P19544 ENSG00000184937              11       32387775     32435564
ADD COMMENT
0
Entering edit mode

Thanks, the uniprotswissprot seems to be the correct one as it retrieved the right one for every protein in my list (eg. Q8WXI7 and Q9BSC4). Any idea what is the difference? is it possible that with new releases or other protein ids, I might need to check them all? Many thanks again

ADD REPLY

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6