Issues with using Entrez Direct discovery by navigation
1
0
Entering edit mode
9 weeks ago

I'm trying to learn how to use Entrez Direct e-utilities to extract FASTAs from NCBI using this guide (https://www.ncbi.nlm.nih.gov/books/NBK179288/), but I can't get the example code for discovery by navigation to work:

 esearch -db pubmed -query 'lycopene cyclase' | elink -related -target protein | efilter -organism mouse -source refseq | efetch -format fasta


I've broken it down step by step, and while the functions up to elink seem to garner results, efilter causes 0 proteins to be found, which makes efetch ineffective as well. How should I modify this code to produce the intended output, which is the FASTA file for the enzyme converting beta-carotene to vitamin A in mice?

unix entrezdirect • 571 views
1
Entering edit mode
9 weeks ago

Hi, the NCBI manual that you've linked uses 2 elink statements while you are using only one:

elink -related | elink -target protein

#NOT as in question



When I follow the NCBI's guide, I get 10 results with the second being the one referenced by the NCBI's manual.

(my whole command based on NCBI's manual: esearch -db pubmed -query "lycopene cyclase" | elink -related | elink -target protein | efilter -organism mouse -source refseq | efetch -format fasta > test.fa)

0
Entering edit mode

Thank you very much for your kind advice! Your command definitely generates a .fa file; however, there seems to be an error while trying to fill it with data, which is as follows:

<ERROR>NCBI C++ Exception:
Error: TXCLIENT(CException::eUnknown) "/pubmed_gen/rbuild/version/20221011/entrez/2.16.1/src/internal/txclient/TxClient.cpp", line 1045: ncbi::CTxRawClientImpl::readAll() --- Read failed: EOF (the other side has unexpectedly closed connection), peer: 130.14.18.59:8064</ERROR>


It also produces a long string of IDs, which I assume are the results requested from PubMed. This isn't the first time I've encountered this error, which is part of the reason I'm asking for advice to see if it's a code issue or some other problem. Do you have any idea what this could be caused by?

0
Entering edit mode

It is odd that your file is empty. (I repeated the command today and it gives same results as before.) What is your ENTREZ version? My is

\$ esearch --help
esearch 13.9


More over, the produced fasta file is result of > test.fa and not related to the success or failure of the ENTREZ chain. You can try to trace the issue by changing efetch for esummary and truncating the individual steps from the end to see what is the last working one.

To your specific error: I haven't encountered this specific error before, but ENTREZ is using remote resource and you can experience network issues. If this happens all the time, even for tried queries, maybe you'll need to contact NCBI's support. Also, there is an query per second limit imposed on ENTREZ (https://www.ncbi.nlm.nih.gov/books/NBK25497/) so be sure that you don't hit those limits.

0
Entering edit mode

Thanks for your reply; my current version of ENTREZ is 18.7. I have run the code step-by-step before, and the last productive step is elink -related, which returns a list of IDs. elink -target protein is where the error I mentioned above starts occurring.

0
Entering edit mode

I've managed to replicate the error with current (18.7) version of ENTREZ-DIRECT. I suggest that you report this issue directly to NCBI, as either, the the command itself is not valid anymore or there is something wrong with the 18.7 version of ENTREZ-DIRECT.

In addition of my version (13.9) I've also tested the latest Bioconda version (16.2) which also produces the fasta file with sequences.