retrieving entire genomic sequence contents of a database
Entering edit mode
3 months ago

Hi all, I'm trying to download all bacterial genomes from ensembl so I can further mine them for bacteriocin gene clusters. However I've been struggling and was hoping someone could advise? Any time I attempt the wget command on the index URL below I get results like "index.html". I've also tried things like wget but no luck.

Can someone please advise on the steps I should take in order to be able to pull all genomic sequences from a database from the command line via ftp, preferably in gbk, gff, or fasta format.

Any help is greatly appreciated!

genomes mining database ftp ensembl • 245 views
Entering edit mode
3 months ago
Mensur Dlakic ★ 14k

There is a script here that does massive genome data download, but from NCBI. For example, this command will download all RefSeq complete bacterial genomes: -g "bacteria" -d "refseq" -l "Complete Genome" -f "genomic.fna.gz" -o "bac_refseq" -t 20

A small command-line change will let you download all GenBank genomes if you wish, and include even those (meta)genomes that may not be complete.

Entering edit mode
12 weeks ago
Ben_Ensembl ★ 1.8k

Hi sandrewsaunderson,

If you are keen to use Ensembl for this task, it's important to remember that the bacterial files are stored in collections on the FTP site. E.g:

This may be where you have encountered problems with your download.

Best wishes

Ben Ensembl Helpdesk


Login before adding your answer.

Traffic: 1343 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6