4.2 years ago
sbd10 • 0

Hello all.

For the past week or so I have been trying to figure out a way to download a full set of refseq genomes as FASTA files for enterobacterales, or gammaproteobacteria if enterobacterales isn't possible. I've been trying to figure out how to achieve this through E-Utilities with little success. My most recent attempt to gather a list of ftp urls was:

esearch -db assembly -query "Enterobacterales[organism] AND assembly_nuccore_refseq[filter]" |
esummary |
xtract -pattern DocumentSummary -element FtpPath

This returns an extremely large number of results which is not feasible to sort out, when I really only expected 120 hits based on:

I'd prefer to remain in Unix and use E-Utilities if possible. Thanks for any advice/proposed modifications.

