How to track which protein ID is linked to which gene ID with rentrez
0
0
Entering edit mode
6.9 years ago
s_herrera ▴ 10

I have a bunch of protein IDs and I want to fetch the corresponding coding sequences (CDSs) without loosing the protein ID. I have managed to download the corresponding CDSs, but unfortunately, CDSs IDs are very different from protein IDs in NCBI.

I have the following R code:

library(rentrez)
Prot_ids <- c("XP_012370245.1","XP_004866438.1","XP_013359583.1")
links <- entrez_link(dbfrom="protein", db="nuccore", id=Prot_ids, by_id = TRUE)

And then, I used this command to "match" protein IDs with CDS IDs:

lapply(links2, function(x) x$links$protein_nuccore_mrna)

[[1]]
[1] "820968283"

[[2]]
[1] "861491027"

[[3]]
[1] "918634580"

However, as you can see the argument 'by_id=TRUE' just make a list of three elink objects but now I have lost the protein IDs.

I would want something like:

Protein ID XP_012370245.1 XP_004866438.1 XP_013359583.1
CDS ID XM_004866381.2 XM_012514791.1 XM_013504129.1

Any suggestion is very welcome, thanks!!

R rentrez NCBI • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6