Question

Download all bacterial genome from Refseq in fasta format from NCBI

0

Entering edit mode

4.0 years ago

MSRS ▴ 580

How can I download all bacterial genome from RefSeq in Fasta format from NCBI in a single file? I want to download it in a server, so the Linux command will be more helpful.

Thanks in Advance

genome • 3.3k views

ADD COMMENT • link 4.0 years ago by MSRS ▴ 580

score 3 · Accepted Answer · 2020-05-08

3

Entering edit mode

4.0 years ago

Asaf 10k

FTP site: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ wget recursively. It's a better option than interacting with eutilities. Also you have genome-downloader: https://github.com/kblin/ncbi-genome-download

ADD COMMENT • link 4.0 years ago by Asaf 10k

0

Entering edit mode

Thank you, dear Asaf. But want to prepare a database for metagenomic analysis, I wish, all the bacterial genome in a single file and Fasta format. I have also tried ncbi-genome-download -F fasta bacteria but it produces a single file for each genome.

ADD REPLY • link 4.0 years ago by MSRS ▴ 580

1

Entering edit mode

I believe there is no good way to do this currently. You will have to post-process your files to merge them all into a single file. If your plan is to eventually prepare a database, you can probably merge the files on-the-fly during the database prep step or somehow tell your database to read in data from multiple files.

ADD REPLY • link 4.0 years ago by vkkodali_ncbi ★ 3.7k

0

Entering edit mode

Thank you very much, dear Vkkodali, So, you want to tell us that we need to download and merge those files into a single file. Here I have download all bacterial genome accordingly (ncbi-genome-download -F fasta bacteria) and try to merge them as cat command (cat *.fna > all.fasta). But error is no such file or directory. Any other way to do that?

ADD REPLY • link 4.0 years ago by MSRS ▴ 580

1

Entering edit mode

But error is no such file or directory. Any other way to do that?

If all fna files are in the current directory then that command will work. If they are not you will need to provide correct full or relative paths. There is no other way to do this. If you are not familiar with unix command line then you should spend some time learning those basics. A good tutorial can be found here.

ADD REPLY • link 4.0 years ago by GenoMax 141k

0

Entering edit mode

Thank you, dear, You are right. But the ncbi-genome-download -F fasta bacteria downloaded each genome in each separate file and inside this file, it creates .fna.gz file. I was working in a server, that's why I can not recognize it.

So, now my point is each genome in each folder compressed .fna.gz files. Is there any way to extract them from all file in a single folder as .fna file. Then cat *.fna > all.fasta will work I have tested.

Thank you

ADD REPLY • link 4.0 years ago by MSRS ▴ 580

1

Entering edit mode

The big question is why do you want to do that? If you're interested in running kraken2/centrifuge I would highly recommend taking a look at gtdb database: https://gtdb.ecogenomic.org/ also take a look at this great resource: https://github.com/rrwick/Metagenomics-Index-Correction read their paper too.

ADD REPLY • link 4.0 years ago by Asaf 10k

0

Entering edit mode

Thank you dear Asaf, I am looking for this kind of resources. Really sorry for westing your valuable time.

ADD REPLY • link 4.0 years ago by MSRS ▴ 580

1

Entering edit mode

You could also use Kaiju. They provide more complete and recent pre-formatted databases from Kaiju server (look in the left column) where you can also submit jobs.

ADD REPLY • link 4.0 years ago by GenoMax 141k

1

Entering edit mode

Don't be.

ADD REPLY • link 4.0 years ago by Asaf 10k

1

Entering edit mode

So far I learn, kraken2 prepare database directly from NCBI. The GTDB is a very good server for bacterial and archeal database. These two database will fulfill all my requiremen.

I am very greatful to you all.

ADD REPLY • link 4.0 years ago by MSRS ▴ 580