Hello everyone,
I’m looking to download all complete bacterial, viral, archaeal, and protozoal genomes from NCBI using ncbi-genome-download.
ncbi-genome-download --formats fasta,assembly-report --parallel 20 --progress-bar --section refseq --flat-output --assembly-levels complete bacteria,viral,fungi,archaea,protozoa
However, I need to restrict these genomes to a specific host—Human, in my case. Since I know ncbi-genome-download does not offer a direct option to specify host, I was wondering if there’s a fast or efficient workaround.
Has anyone faced this issue before or found a practical solution?
Thank you in advance for your help!
This information is not available for all genomes, but it is there for some of the records. That is how the
dataset
answer above is able to find the "Biosample host". These genomes are from organisms, where the source was listed as human.Since OP is looking for genomes where the host is explicitly listed, using a generic name may pull out genomes that do not satisfy the human host limit. e.g without the
grep
for human, we can find genome accessions that are from non-human/no-host/source listed.Always good to learn new things.