I'm trying to learn how to use Entrez Direct e-utilities to extract FASTAs from NCBI using this guide (https://www.ncbi.nlm.nih.gov/books/NBK179288/), but I can't get the example code for discovery by navigation to work:
esearch -db pubmed -query 'lycopene cyclase' | elink -related -target protein | efilter -organism mouse -source refseq | efetch -format fasta
I've broken it down step by step, and while the functions up to elink seem to garner results, efilter causes 0 proteins to be found, which makes efetch ineffective as well. How should I modify this code to produce the intended output, which is the FASTA file for the enzyme converting beta-carotene to vitamin A in mice?
Thank you very much for your kind advice! Your command definitely generates a .fa file; however, there seems to be an error while trying to fill it with data, which is as follows:
It also produces a long string of IDs, which I assume are the results requested from PubMed. This isn't the first time I've encountered this error, which is part of the reason I'm asking for advice to see if it's a code issue or some other problem. Do you have any idea what this could be caused by?
It is odd that your file is empty. (I repeated the command today and it gives same results as before.) What is your ENTREZ version? My is
More over, the produced fasta file is result of
> test.fa
and not related to the success or failure of the ENTREZ chain. You can try to trace the issue by changingefetch
foresummary
and truncating the individual steps from the end to see what is the last working one.To your specific error: I haven't encountered this specific error before, but ENTREZ is using remote resource and you can experience network issues. If this happens all the time, even for tried queries, maybe you'll need to contact NCBI's support. Also, there is an query per second limit imposed on ENTREZ (https://www.ncbi.nlm.nih.gov/books/NBK25497/) so be sure that you don't hit those limits.
Thanks for your reply; my current version of ENTREZ is 18.7. I have run the code step-by-step before, and the last productive step is
elink -related
, which returns a list of IDs.elink -target protein
is where the error I mentioned above starts occurring.I've managed to replicate the error with current (
18.7
) version ofENTREZ-DIRECT
. I suggest that you report this issue directly to NCBI, as either, the the command itself is not valid anymore or there is something wrong with the18.7
version ofENTREZ-DIRECT
.In addition of my version (13.9) I've also tested the latest
Bioconda
version (16.2
) which also produces the fasta file with sequences.