News:Ncbi Releases Entrez Direct, The Entrez Utilities On The Unix Command Line
8.5 years ago

http://www.ncbi.nlm.nih.gov/news/02-06-2014-entrez-direct-released/

NCBI has just released Entrez Direct, a new software suite that enables users to use the UNIX command line to directly access NCBI databases, as well as to parse and format the data to create customized downloads.

## Retrieve a set of PubMed abstracts

O so happy to here that :D

Documentation is here:

Documentation

FTP

I should understand that it works on Linux also. Right?

8.5 years ago

I have been trying these utilities in the last few days. They work nice, but the documentation is very obscure! There is not even an --help flag implemented, and you have to go back to the web page every time.

Anyway, here are a few examples, not covered in the in the documentation:

Given a Gene ID, download the aminoacid sequences of the corresponding Proteins, keeping only the reviewed entries (e.g. no putative, predicted sequences):

esearch -db gene -query "1234[id]" | elink -target protein | efilter -query "REVIEWED[FILTER]"| efetch -format fasta


Given a file containing a list of Gene IDs (one per line), download all the entries in tabular format:

esearch -db gene -query \$(paste -s -d ','  mygenes.ids) | efetch -format tabular > mygenes.details.txt


8.5 years ago

First thought: does this mean that my home grown, undocumented, hacky curl based shell script is obsolete?

Second thought: What took them so long?

I'm just happy they finally replaced eutils with something more user-friendly..

I couldn't agree more, e-utils was awful. I'm still not a total fan of relying on entrez queries to obtain larger amounts of sequence data. I've not had a good experience with getting the accuracy I need, there's always something that I don't want in there.