How to keep the top hits only in the output file of hmmscan?
2.6 years ago
A_heath ▴ 120

Hi all,

I recently downloaded HMMER to use hmmscan locally in Linux with a Pfam database. It works great, however the ouput files are quite difficult to read quickly in my opinion...

I tried using output options such as: --tblout, --domtblout, --pfamtblout, etc. but the ouput files are still voluminous.

I would like to keep only the top hits in my output files.

I've seen that it was possible with hmmsearch so I was wondering if there was something similar with hmmscan... Ideally, I would want an output as I could find online:

If you have any suggestions, I'll gladly took them. Thank you in advance for your very appreciated help!

2.6 years ago
A_heath ▴ 120

For anyone interested, I figured it out using this amazing resource: http://slhogle.github.io/2015/remove-duplicate-lines/ and the option --tblout of hmmscan.

I did:

hmmscan --tblout output_file.pfam Pfam-A.hmm seq_file.fasta


and then:

awk '!x[\$3]++' ouput_file.pfam > MYBESTHITS.pfam


MYBESTHITS.pfam file is basically what I got online with the top hit for each protein sequences.

Hey! You mentioned you know how to find the top hits from hmmscan. Could you share how you do this?