Hello all.
For the past week or so I have been trying to figure out a way to download a full set of refseq genomes as FASTA files for enterobacterales, or gammaproteobacteria if enterobacterales isn't possible. I've been trying to figure out how to achieve this through E-Utilities with little success. My most recent attempt to gather a list of ftp urls was:
esearch -db assembly -query "Enterobacterales[organism] AND assembly_nuccore_refseq[filter]" |
esummary |
xtract -pattern DocumentSummary -element FtpPath
This returns an extremely large number of results which is not feasible to sort out, when I really only expected 120 hits based on: https://www.ncbi.nlm.nih.gov/genome/browse/reference/#
I'd prefer to remain in Unix and use E-Utilities if possible. Thanks for any advice/proposed modifications.