I'm trying to download some protein sequences using Entrez through command line. However, altough 'esearch' command finds my proteins (searching by ID), the 'efetch' don't return anything. I've tried both gp and fasta formats.
Here are some example IDs:
GCB61038,
GCB69151,
GCC23899,
GCC32047,
MXQ86236
I have a list with 457 proteins that I can't download the sequences.
Oh, you saved my life! If you allow me, one last question. Some of my proteins are encoded by a whole genome CDs. In the GP file it appears as "/coded_by="join(WAAD01021866.1:<231469..231602, etc ". When I download this file (fasta) using the efetch, I get the whole genome splitted in the proteins it codes, will this work in the way you thought me?
I noticed later that I could download the fasta using the IPG database, but what I really need is the GP file, because I'll need later the taxa information and the CDs IDs. But I thing this sequences do have some problems, I tried taking by hand some CDs IDs and tried to download the nucleotide Fasta, and It did not work, even tought the informations are in the database.
When using web API it looks like it automatically searches against ALL protein databases.
Can you download the GP this way?
Yes. Change the command to this:
Oh, you saved my life! If you allow me, one last question. Some of my proteins are encoded by a whole genome CDs. In the GP file it appears as "/coded_by="join(WAAD01021866.1:<231469..231602, etc ". When I download this file (fasta) using the efetch, I get the whole genome splitted in the proteins it codes, will this work in the way you thought me?