Question

How to download multiple genome files using command line (MacOS) using datasets

0

Entering edit mode

11 months ago

scholaraspect2008 • 0

Hi there seniors,

I am currently new to command lines and I'm currently using this command line to download the genome of bacteria one by one, I wanted to ask if there is a faster way like since i have a list of GCF accession id to download the genome for analysis. I'm using this command:

./datasets download genome accession 'GCF_*' --include gff3,gbff,rna,cds,protein,genome,seq-report

Thanks in advance

Cheers,
Aldre

ncbi-datasets Bacteria Genome • 1.6k views

ADD COMMENT • link updated 11 months ago by Ram 44k • written 11 months ago by scholaraspect2008 • 0

1

Entering edit mode

You can use

./datasets  download genome accession --inputfile file_name_w_accession

file_name_w_accession should contain one accession per line. Use any additional options as needed.

ADD REPLY • link 11 months ago by GenoMax 147k

0

Entering edit mode

you should use the wild cards to download all the given above GCF files. Moverover, you can use the commands of curl , wget and slow5curl to download the data from database.

ADD REPLY • link 11 months ago by rj.rezwan ▴ 10

0

Entering edit mode

by means of that, how should I write my scripting?

say I have

GCF000001
GCF000002
GCF000003

btw sorry if I wasn't that clear with the replies, I just started learning command lines

ADD REPLY • link updated 11 months ago by Ram 44k • written 11 months ago by scholaraspect2008 • 0

0

Entering edit mode

you can use the commands of curl , wget and slow5curl to download the data from database.

Not in this situation, it is much easier to use the NCBI datasets tool.

ADD REPLY • link 11 months ago by Michael 55k

0

Entering edit mode

hi there, thanks for the reply, is it possible to disclose an example of ./datasets script for that? thank you very much. i m currently trying various method as well...

ADD REPLY • link 11 months ago by scholaraspect2008 • 0

0

Entering edit mode

here https://stackoverflow.com/a/1521498 is an example how to use read and for loop, you just need to put in your command.

ADD REPLY • link 11 months ago by Michael 55k

score 0 · Answer 1 · 2023-12-05

datasets download genome accession --inputfile accessions.txt --include gff3,gbff,rna,cds,protein,genome,seq-report

Or you simply specify mutliple accessions on the commandline:

 datasets download genome accession GCF_000001405.40 GCA_003774525.2 GCA_000001635

Edit: Sorry, I overlooked the --inputfile option.

This is necessary unless all accessions are from a common taxon or bioproject. In the first case you can simply do:

datasets download genome taxon 'Actinobacteria' --include gff3,gbff,rna,cds,protein,genome,seq-report

Collecting 87,619 genome records [>-----------------------------------------------]   5% 4000/87619

possibly you want to choose a taxon at a lower level.

Please refer to the documentation for further options https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/genomes/download-genome/