where can I get environmental bacteria genome in fasta format (as many as possible)?
5.2 years ago


I am trying to obtain environmental bacteria genome data in fasta format. (as many as possible).

That fasta file will be used as input file for biobloomtools' biobloommaker.

Can you tell me where to get it?

I already visited jgi (http://jgi.doe.gov/data-and-tools/) and try to use genome portal and img; I have no clue how I can get all fasta files at once..

Thank you.

5.2 years ago
natasha.sernova ★ 3.8k

Dear janghj,

You can try NCBI new location. Old NCBI-site has been changed. Now it is situated here: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ Be careful - some of the genomes here are not finished.

Inside the file mentioned below you will find a particular url for any bacterium you are looking for:

ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/assembly_summary_genbank.txt Inside each reference you will find fna.gz - genomic gz-files and some other useful files.

For example, you will find the page below for Acidithiobacillales bacterium SM1_46 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_001304215.1_ASM130421v1/

Old NCBI site has been moved here: ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/

See README files for more information. A lot has been changed there. Good luck!

Thank you for you respond.

I visited the NCBI new location as you provided.

It seems like it show me all kinds of bacteria. I am only looking to environmental ones...

Is there any way I can grep environmental ones?

Or should I check and download files one by one?

Thank you

Hello! The tags you used initially imply you are looking for only soil bacteria. Is your question implied how to search NCBI just for soil bacteria? http://www.dpi.nsw.gov.au/__data/assets/pdf_file/0017/41642/Soil_bacteria.pdf I would go to NCBI general site, www.ncbi.nlm.nih.gov and type your question: http://www.ncbi.nlm.nih.gov/pmc/?term=soil+bacteria+from+NCBI++AND+their+genomes You will receive several thousand papers. Probably you will have to specify what particular species you are looking for. “Environmental bacteria” – it’s too general question in my opinion. The question has to be as narrow and special as possible. I am not sure the sequence authors mentioned where and only where a particlar bacteria may be found. Probably, you have to make your environment definition as specific as possible. And I am not sure NCBI has a special tag for this kind of search. Actually you can easily check it. Go inside the known "environmental bacteria" genome file and see what information you can find there. I didn't find a general method. I would try to do it one after another just to start. Good luck! Natasha


