Building Ensembl database for HMMER
0
0
Entering edit mode
7 weeks ago
J • 0

Hello,

I am wanting to run HMMER (specifically JACKHMMER) on my local workstation as the EBI website (https://www.ebi.ac.uk/Tools/hmmer/) is becoming increasingly unstable, particularly for JACKHMMER searches. I have it installed and tested and is working fine.

What I want to do is to be able to run a search on the whole Ensembl Genomes Bacteria database, as you can from the website. I have downloaded the FASTA formatted protein sequences for a test batch of bacterial genomes from the Ensembl FTP server (https://ftp.ebi.ac.uk/ensemblgenomes/pub/bacteria/release-58/) but what I don't understand is how to build a database to search through all of them.

In the HMMER tutorial, it seems I just specify my query protein sequence and then the database in .fasta format. So my questions are two:

  1. How do I build a database to provide as the argument for the jackhmmer command - do i simply concatenate every FASTA file I have from each genome assembly together into a single file?
  2. What is the best way to do this so I can have easier access to the metadata like taxid and organism name, or do I just have to parse the output from HMMER and re-search the raw data from ensembl myself to get all this?

Many thanks!

hmmer • 157 views
ADD COMMENT

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6