Entering edit mode
4.5 years ago
MSRS
▴
590
How can I download all bacterial genome from RefSeq in Fasta format from NCBI in a single file? I want to download it in a server, so the Linux command will be more helpful.
Thanks in Advance
Thank you, dear Asaf. But want to prepare a database for metagenomic analysis, I wish, all the bacterial genome in a single file and Fasta format. I have also tried
ncbi-genome-download -F fasta bacteria
but it produces a single file for each genome.I believe there is no good way to do this currently. You will have to post-process your files to merge them all into a single file. If your plan is to eventually prepare a database, you can probably merge the files on-the-fly during the database prep step or somehow tell your database to read in data from multiple files.
Thank you very much, dear Vkkodali, So, you want to tell us that we need to download and merge those files into a single file. Here I have download all bacterial genome accordingly
(ncbi-genome-download -F fasta bacteria)
and try to merge them as cat command(cat *.fna > all.fasta)
. But error is no such file or directory. Any other way to do that?If all
fna
files are in the current directory then that command will work. If they are not you will need to provide correct full or relative paths. There is no other way to do this. If you are not familiar with unix command line then you should spend some time learning those basics. A good tutorial can be found here.Thank you, dear, You are right. But the
ncbi-genome-download -F fasta bacteria
downloaded each genome in each separate file and inside this file, it creates .fna.gz file. I was working in a server, that's why I can not recognize it.So, now my point is each genome in each folder compressed .fna.gz files. Is there any way to extract them from all file in a single folder as .fna file. Then
cat *.fna > all.fasta
will work I have tested.Thank you
The big question is why do you want to do that? If you're interested in running kraken2/centrifuge I would highly recommend taking a look at gtdb database: https://gtdb.ecogenomic.org/ also take a look at this great resource: https://github.com/rrwick/Metagenomics-Index-Correction read their paper too.
Thank you dear Asaf, I am looking for this kind of resources. Really sorry for westing your valuable time.
You could also use Kaiju. They provide more complete and recent pre-formatted databases from Kaiju server (look in the left column) where you can also submit jobs.
Don't be.
So far I learn, kraken2 prepare database directly from NCBI. The GTDB is a very good server for bacterial and archeal database. These two database will fulfill all my requiremen.
I am very greatful to you all.