Hello everyone,
I’m looking to download all complete bacterial, viral, archaeal, and protozoal genomes from NCBI using ncbi-genome-download.
ncbi-genome-download --formats fasta,assembly-report --parallel 20 --progress-bar --section refseq --flat-output --assembly-levels complete bacteria,viral,fungi,archaea,protozoa
However, I need to restrict these genomes to a specific host—Human, in my case. Since I know ncbi-genome-download does not offer a direct option to specify host, I was wondering if there’s a fast or efficient workaround.
Has anyone faced this issue before or found a practical solution?
Thank you in advance for your help!
It is astonishing that there is no straightforward way to match accessions and their host. Anyhow, I will use
datasets
as suggested, though on a much larger list of accession numbers (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=1) compared to step 1, which includes more than 3 million records. Thank you for your help.