Download all bacterial genome from Refseq in fasta format from NCBI
1
0
Entering edit mode
4.0 years ago
MSRS ▴ 580

How can I download all bacterial genome from RefSeq in Fasta format from NCBI in a single file? I want to download it in a server, so the Linux command will be more helpful.

Thanks in Advance

genome • 3.3k views
ADD COMMENT
3
Entering edit mode
4.0 years ago
Asaf 10k

FTP site: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ wget recursively. It's a better option than interacting with eutilities. Also you have genome-downloader: https://github.com/kblin/ncbi-genome-download

ADD COMMENT
0
Entering edit mode

Thank you, dear Asaf. But want to prepare a database for metagenomic analysis, I wish, all the bacterial genome in a single file and Fasta format. I have also tried ncbi-genome-download -F fasta bacteria but it produces a single file for each genome.

ADD REPLY
1
Entering edit mode

I believe there is no good way to do this currently. You will have to post-process your files to merge them all into a single file. If your plan is to eventually prepare a database, you can probably merge the files on-the-fly during the database prep step or somehow tell your database to read in data from multiple files.

ADD REPLY
0
Entering edit mode

Thank you very much, dear Vkkodali, So, you want to tell us that we need to download and merge those files into a single file. Here I have download all bacterial genome accordingly (ncbi-genome-download -F fasta bacteria) and try to merge them as cat command (cat *.fna > all.fasta). But error is no such file or directory. Any other way to do that?

ADD REPLY
1
Entering edit mode

But error is no such file or directory. Any other way to do that?

If all fna files are in the current directory then that command will work. If they are not you will need to provide correct full or relative paths. There is no other way to do this. If you are not familiar with unix command line then you should spend some time learning those basics. A good tutorial can be found here.

ADD REPLY
0
Entering edit mode

Thank you, dear, You are right. But the ncbi-genome-download -F fasta bacteria downloaded each genome in each separate file and inside this file, it creates .fna.gz file. I was working in a server, that's why I can not recognize it.

So, now my point is each genome in each folder compressed .fna.gz files. Is there any way to extract them from all file in a single folder as .fna file. Then cat *.fna > all.fasta will work I have tested.

Thank you

ADD REPLY
1
Entering edit mode

The big question is why do you want to do that? If you're interested in running kraken2/centrifuge I would highly recommend taking a look at gtdb database: https://gtdb.ecogenomic.org/ also take a look at this great resource: https://github.com/rrwick/Metagenomics-Index-Correction read their paper too.

ADD REPLY
0
Entering edit mode

Thank you dear Asaf, I am looking for this kind of resources. Really sorry for westing your valuable time.

ADD REPLY
1
Entering edit mode

You could also use Kaiju. They provide more complete and recent pre-formatted databases from Kaiju server (look in the left column) where you can also submit jobs.

ADD REPLY
1
Entering edit mode

Don't be.

ADD REPLY
1
Entering edit mode

So far I learn, kraken2 prepare database directly from NCBI. The GTDB is a very good server for bacterial and archeal database. These two database will fulfill all my requiremen.

I am very greatful to you all.

ADD REPLY

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6