Programmatic download of complete genomes from NCBI for specific taxonomy identifier
Entering edit mode
4.0 years ago
marc.bourqui ▴ 20

Hi all,

I am looking for a programmatic way to download complete genomes from RefSeq for a specific taxonomy identifier, in my case I am interested in the Lactobacillales order.

  • From the Genome Download FTP it is not possible to filter by the order, only genus and species.
  • I tried to follow the steps described in this post. First two steps (esearch and elink) are okay, but then I do not know how to select my genomes of interest according to my criterion (complete and from RefSeq).
  • I also tried the Ebot pipeline genertaor, but then again I am not sure about the query qualifiers to apply.
  • My best approach so far is to use the Genome browser. From there, I can apply my filters and then download the selected records as a .csv or .txt. How can I get the same output file without using the web interface?

Thanks in advance for any hints and help!

genome ncbi • 1.3k views
Entering edit mode
4.0 years ago

Try this, it should work.

  1. get taxids of species belonging to an order using taxonkit list ,
  2. and download all refseq complete geomes according to the species_taxid refering to this post (section "Filter by species_taxid") .

Login before adding your answer.

Traffic: 2373 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6