How to download multiple genome files using command line (MacOS) using datasets
1
0
Entering edit mode
11 months ago

Hi there seniors,

I am currently new to command lines and I'm currently using this command line to download the genome of bacteria one by one, I wanted to ask if there is a faster way like since i have a list of GCF accession id to download the genome for analysis. I'm using this command:

./datasets download genome accession 'GCF_*' --include gff3,gbff,rna,cds,protein,genome,seq-report 

Thanks in advance

Cheers,
Aldre

ncbi-datasets Bacteria Genome • 1.6k views
ADD COMMENT
1
Entering edit mode

You can use

./datasets  download genome accession --inputfile file_name_w_accession

file_name_w_accession should contain one accession per line. Use any additional options as needed.

ADD REPLY
0
Entering edit mode

you should use the wild cards to download all the given above GCF files. Moverover, you can use the commands of curl , wget and slow5curl to download the data from database.

ADD REPLY
0
Entering edit mode

by means of that, how should I write my scripting?

say I have

GCF000001
GCF000002
GCF000003

btw sorry if I wasn't that clear with the replies, I just started learning command lines

ADD REPLY
0
Entering edit mode

you can use the commands of curl , wget and slow5curl to download the data from database.

Not in this situation, it is much easier to use the NCBI datasets tool.

ADD REPLY
0
Entering edit mode

hi there, thanks for the reply, is it possible to disclose an example of ./datasets script for that? thank you very much. i m currently trying various method as well...

ADD REPLY
0
Entering edit mode

here https://stackoverflow.com/a/1521498 is an example how to use read and for loop, you just need to put in your command.

ADD REPLY
0
Entering edit mode
11 months ago
Michael 55k
datasets download genome accession --inputfile accessions.txt --include gff3,gbff,rna,cds,protein,genome,seq-report

Or you simply specify mutliple accessions on the commandline:

 datasets download genome accession GCF_000001405.40 GCA_003774525.2 GCA_000001635

Edit: Sorry, I overlooked the --inputfile option.

This is necessary unless all accessions are from a common taxon or bioproject. In the first case you can simply do:

datasets download genome taxon 'Actinobacteria' --include gff3,gbff,rna,cds,protein,genome,seq-report

Collecting 87,619 genome records [>-----------------------------------------------]   5% 4000/87619

possibly you want to choose a taxon at a lower level.

Please refer to the documentation for further options https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/genomes/download-genome/

ADD COMMENT

Login before adding your answer.

Traffic: 861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6