Using E-Utilities/Entrez/Elink to access identical proteins
1
1
Entering edit mode
7.2 years ago
pawlowac ▴ 80

I'm trying to access identical protein information for a list of protein accession numbers. My goal it to take protein ID's, get identical protein list, use the nucleotide coordinates from that list to specifically download the upstream and downstream region of a given gene.

I was hoping to use the biopython Entrez module.

I've used the below code

epost_1 = Entrez.read(Entrez.epost("protein", id=",".join(Prot_ID_list)))
webenv = epost_1["WebEnv"]
query_key = epost_1["QueryKey"]
prot_link = Entrez.read(Entrez.elink(dbfrom="protein", db="protein", LinkName="protein_protein_identical", webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"]))

However, this doesn't work. It just returns me a list of protein IDs that I originally posted.

I've also tried using eutils on ubuntu

epost -db protein -id Prot_ID_List | elink -related -name protein_protein_identical | efetch -format text

but it just gives an error "QueryKey value not found in fetch input". I tried variations on this to no avail. I am probably doing something very wrong with eutils, but I am not sure what.

Any help is greatly appreciated.

NCBI biopython genbank entrez • 3.6k views
ADD COMMENT
0
Entering edit mode

Also, I am aware of How to Elink identical proteins from protein id?. However, I need to make thousands of requests, and this doesn't seem the most succinct method for doing so and takes longer.

ADD REPLY
2
Entering edit mode
7.2 years ago
pawlowac ▴ 80

It appears I was approaching the question wrong. You don't need to 'link' to the identical protein list, despite the 'protein_protein_identical' link name, and description here on the NCBI website.

Instead, it is information intrinsic to each accession number.

epost_1 = Entrez.read(Entrez.epost("protein", id=",".join(prot_list)))
    webenv = epost_1["WebEnv"]
    query_key = epost_1["QueryKey"]
    iden_prots = Entrez.efetch(db="protein", rettype='ipg', retmode='text', webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"])

I used the above to get the identical protein list. rettype=ipg downloads the identical protein list in tab-delimited format. Hopefully other people come across this if they get stuck like I did.

ADD COMMENT

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6