Question: Using E-Utilities/Entrez/Elink to access identical proteins
0
gravatar for pawlowac
4 months ago by
pawlowac60
Canada
pawlowac60 wrote:

I'm trying to access identical protein information for a list of protein accession numbers. My goal it to take protein ID's, get identical protein list, use the nucleotide coordinates from that list to specifically download the upstream and downstream region of a given gene.

I was hoping to use the biopython Entrez module.

I've used the below code

epost_1 = Entrez.read(Entrez.epost("protein", id=",".join(Prot_ID_list)))
webenv = epost_1["WebEnv"]
query_key = epost_1["QueryKey"]
prot_link = Entrez.read(Entrez.elink(dbfrom="protein", db="protein", LinkName="protein_protein_identical", webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"]))

However, this doesn't work. It just returns me a list of protein IDs that I originally posted.

I've also tried using eutils on ubuntu

epost -db protein -id Prot_ID_List | elink -related -name protein_protein_identical | efetch -format text

but it just gives an error "QueryKey value not found in fetch input". I tried variations on this to no avail. I am probably doing something very wrong with eutils, but I am not sure what.

Any help is greatly appreciated.

entrez genbank biopython ncbi • 314 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by pawlowac60

Also, I am aware of How to Elink identical proteins from protein id?. However, I need to make thousands of requests, and this doesn't seem the most succinct method for doing so and takes longer.

ADD REPLYlink written 4 months ago by pawlowac60
1
gravatar for pawlowac
4 months ago by
pawlowac60
Canada
pawlowac60 wrote:

It appears I was approaching the question wrong. You don't need to 'link' to the identical protein list, despite the 'protein_protein_identical' link name, and description here on the NCBI website.

Instead, it is information intrinsic to each accession number.

epost_1 = Entrez.read(Entrez.epost("protein", id=",".join(prot_list)))
    webenv = epost_1["WebEnv"]
    query_key = epost_1["QueryKey"]
    iden_prots = Entrez.efetch(db="protein", rettype='ipg', retmode='text', webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"])

I used the above to get the identical protein list. rettype=ipg downloads the identical protein list in tab-delimited format. Hopefully other people come across this if they get stuck like I did.

ADD COMMENTlink written 4 months ago by pawlowac60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1215 users visited in the last hour