Question

How to keep the top hits only in the output file of hmmscan?

1

Entering edit mode

3.7 years ago

A_heath ▴ 160

Hi all,

I recently downloaded HMMER to use hmmscan locally in Linux with a Pfam database. It works great, however the ouput files are quite difficult to read quickly in my opinion...

I tried using output options such as: --tblout, --domtblout, --pfamtblout, etc. but the ouput files are still voluminous.

I would like to keep only the top hits in my output files.

I've seen that it was possible with hmmsearch so I was wondering if there was something similar with hmmscan... Ideally, I would want an output as I could find online: cf. here

If you have any suggestions, I'll gladly took them. Thank you in advance for your very appreciated help!

hmmer hmmscan • 3.3k views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 3.7 years ago by A_heath ▴ 160

score 3 · Accepted Answer · 2020-08-06

3

Entering edit mode

3.7 years ago

A_heath ▴ 160

For anyone interested, I figured it out using this amazing resource: http://slhogle.github.io/2015/remove-duplicate-lines/ and the option --tblout of hmmscan.

I did:

hmmscan --tblout output_file.pfam Pfam-A.hmm seq_file.fasta

and then:

awk '!x[$3]++' ouput_file.pfam > MYBESTHITS.pfam

MYBESTHITS.pfam file is basically what I got online with the top hit for each protein sequences.

ADD COMMENT • link 3.7 years ago by A_heath ▴ 160

0

Entering edit mode

Hey! You mentioned you know how to find the top hits from hmmscan. Could you share how you do this?

ADD REPLY • link 3.0 years ago by niamhlacyroberts • 0