Question: Using E-Utilities/Entrez/Elink to access identical proteins
gravatar for pawlowac
2.6 years ago by
pawlowac60 wrote:

I'm trying to access identical protein information for a list of protein accession numbers. My goal it to take protein ID's, get identical protein list, use the nucleotide coordinates from that list to specifically download the upstream and downstream region of a given gene.

I was hoping to use the biopython Entrez module.

I've used the below code

epost_1 ="protein", id=",".join(Prot_ID_list)))
webenv = epost_1["WebEnv"]
query_key = epost_1["QueryKey"]
prot_link ="protein", db="protein", LinkName="protein_protein_identical", webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"]))

However, this doesn't work. It just returns me a list of protein IDs that I originally posted.

I've also tried using eutils on ubuntu

epost -db protein -id Prot_ID_List | elink -related -name protein_protein_identical | efetch -format text

but it just gives an error "QueryKey value not found in fetch input". I tried variations on this to no avail. I am probably doing something very wrong with eutils, but I am not sure what.

Any help is greatly appreciated.

entrez genbank biopython ncbi • 1.3k views
ADD COMMENTlink modified 2.5 years ago • written 2.6 years ago by pawlowac60

Also, I am aware of How to Elink identical proteins from protein id?. However, I need to make thousands of requests, and this doesn't seem the most succinct method for doing so and takes longer.

ADD REPLYlink written 2.5 years ago by pawlowac60
gravatar for pawlowac
2.5 years ago by
pawlowac60 wrote:

It appears I was approaching the question wrong. You don't need to 'link' to the identical protein list, despite the 'protein_protein_identical' link name, and description here on the NCBI website.

Instead, it is information intrinsic to each accession number.

epost_1 ="protein", id=",".join(prot_list)))
    webenv = epost_1["WebEnv"]
    query_key = epost_1["QueryKey"]
    iden_prots = Entrez.efetch(db="protein", rettype='ipg', retmode='text', webenv=epost_1["WebEnv"], query_key=epost_1["QueryKey"])

I used the above to get the identical protein list. rettype=ipg downloads the identical protein list in tab-delimited format. Hopefully other people come across this if they get stuck like I did.

ADD COMMENTlink written 2.5 years ago by pawlowac60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 861 users visited in the last hour