Hi all,
I was wondering if there's a streamlined way of getting the "representative" genome for a particular species. What I'm looking for is an automated way of retrieving a genome assembly and annotation for all bacterial species in, e.g. Kraken2 output. The most popular databases for Kraken2 are made from RefSeq, so I'd imagine there should be a relatively easy way to match Taxonomy ID/species name to a RefSeq ID.
Any advice would be appreciated, as always!
All the best
-- Alex
Hi Mirian,
Thank you for your comment! I wanted to maybe avoid downloading all the bacterial species, and only download the assemblies and annotations of species/strains of interest. What is the best way to do this?
Thank you!
Download using accession numbers : downloading genomes in fasta format from accession ids
Hi Alex,
I understand. With
datasets
, you have two options:datasets
does not have an option to download by a list of taxa. But you can go around it by creating a loop. Assuming you have a list of taxids:You can loop over that list and download each representative as a separate data package.
I hope this helps. Let me know if you have any other questions. :)