Question: How to download all the bacteria genomes from NCBI?
0
gravatar for anran04100
4 weeks ago by
anran041000
anran041000 wrote:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt

I download the assembly_summary.txt

awk -F '\t' '{if($12=="Complete Genome") print $20}' assembly_summary_path.txt > assembly_summary_complete_genomes_path.txt

select Complete Genome to a new file which save the path of bacteria

but it turns out that the path file includes 21272 rows I wonder if there should be 3000+ rows since there are 3000+ bacteria in NCBI What's wrong with it? How can I download all the bacteria genomes from NCBI?

Thanks!

genome • 150 views
ADD COMMENTlink modified 4 weeks ago by genomax92k • written 4 weeks ago by anran041000
1
gravatar for genomax
4 weeks ago by
genomax92k
United States
genomax92k wrote:

Use ncbi-genome-download tool from Kai Blin. You probably should look at the assembly levels (complete) and perhaps RefSeq category to download only complete, high-quality genomes. There is a lot of redundancy within species because of strains etc.

ADD COMMENTlink written 4 weeks ago by genomax92k

Right, the tool is great!

ADD REPLYlink written 4 weeks ago by shenwei3565.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour