I am wanting to run HMMER (specifically JACKHMMER) on my local workstation as the EBI website ( is becoming increasingly unstable, particularly for JACKHMMER searches. I have it installed and tested and is working fine.

What I want to do is to be able to run a search on the whole Ensembl Genomes Bacteria database, as you can from the website. I have downloaded the FASTA formatted protein sequences for a test batch of bacterial genomes from the Ensembl FTP server ( but what I don't understand is how to build a database to search through all of them.

In the HMMER tutorial, it seems I just specify my query protein sequence and then the database in .fasta format. So my questions are two:

  1. How do I build a database to provide as the argument for the jackhmmer command - do i simply concatenate every FASTA file I have from each genome assembly together into a single file?
  2. What is the best way to do this so I can have easier access to the metadata like taxid and organism name, or do I just have to parse the output from HMMER and re-search the raw data from ensembl myself to get all this?

Many thanks!

