There are several ways to do this with R and some Bioconductor packages. Here I put three different methods, using annotation packages, using biomaRt package and using Uniprot.ws package. In each case you need to specify the target species, which makes this not so convenient if you have multi-species mapping in mind.
Annotation package approach
Here I use the annotation package org.Hs.eg.db and the interface provided by AnnotationDbi. This method returns all associated Refseq ids together, including peptide and nucleotide ids, which might be what you want or not. Advantage: do not need access to online server.
id <- "P62195" # vector of ids to map. library(org.Hs.eg.db) # columns(org.Hs.eg.db) # check other columns to be returned. # keytypes(org.Hs.eg.db) # check other keys for query. select(org.Hs.eg.db, id, "REFSEQ", "UNIPROT") # 'select()' returned 1:many mapping between keys and columns # UNIPROT REFSEQ # 1 P62195 NM_001199163 # 2 P62195 NM_002805 # 3 P62195 NP_001186092 # 4 P62195 NP_002796 # 5 P62195 XR_934508
biomaRt package approach
The biomaRt package connects to the biomart resource at Ensembl and makes queries based on different filters. You can get separately nucleotide/peptide ids, if desired. Check also different attributes that can be returned. Advantage: access the latest information in Ensembl.
library(biomaRt) mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl") # listAttributes(mart) # check other mappings. # listFilters(mart) # check other filters. getBM( attributes = c("refseq_peptide", "external_gene_name", "description"), filters = "uniprot_swissprot", values = id, mart = mart ) # refseq_peptide external_gene_name # 1 NP_002796 PSMC5 # 2 PSMC5 # 3 NP_001186092 PSMC5 getBM( attributes = c("refseq_mrna", "external_gene_name"), filters = "uniprot_swissprot", values = id, mart = mart ) # refseq_mrna external_gene_name # 1 NM_002805 PSMC5 # 2 PSMC5 # 3 NM_001199163 PSMC5
UniProt.ws package approach
UniProt.ws works like something between the other too in the sense that you have to establish a connection first then you make queries with the
select() interface. Advantage: seems the most natural way to make queries about Uniprot ids.
libraryUniProt.ws) up <- UniProt.ws(taxId=9606) # taxid for homo sapiens. # columns(up) # check other columns to be returned. # keytypes(up) # check other keys for query. select(up, id, columns = "REFSEQ_PROTEIN", keytype = "UNIPROTKB") # Getting mapping data for P62195 ... and P_REFSEQ_AC # 'select()' returned 1:many mapping between keys and columns # UNIPROTKB REFSEQ_PROTEIN # 1 P62195 NP_001186092.1 # 2 P62195 NP_002796.4 select(up, id, columns = c("REFSEQ_NUCLEOTIDE"), keytype = "UNIPROTKB") # Getting mapping data for P62195 ... and REFSEQ_NT_ID # 'select()' returned 1:many mapping between keys and columns # UNIPROTKB REFSEQ_NUCLEOTIDE # 1 P62195 NM_001199163.1 # 2 P62195 NM_002805.5