Question: How to search CD-HIT clusters against HMM profiles
14 months ago by
Hi there,

I have a metagenome assembly with ORFs predicted with prodigal and clustered with CD-HIT - resulting in a file with clusters of fasta headers and a normal unsorted fasta file.

I would like to search this against the pVOGs database ( I've downloaded the pVOGS HMM profiles and tried to work out how to search against it but I can't make sense of the HMMER manual. I am doing this so I can filter out any contigs that doesn't have hits (or the hits are really sparse).

Thanks for helping in advance

hmmsearch <your.hmm> <your.translated.orfs.fasta>
Thanks for your reply, How do I do this if the pVOGs db is made up of 100s of hmm profiles and my clustered genes are in an unsorted fasta file and cluster headers are in another file?

  1. I think you can just cat the .hmm files together into one big one. Alternatively, run a loop over all .hmm files (but I don't think it's necessary).
  2. This unsorted fasta file is one of the outputs of CD-HIT, isn't it? In this case, you should have your cluster representatives in that file.
