Question: How to keep the top hits only in the output file of hmmscan?
0
gravatar for Audrey
6 weeks ago by
Audrey20
France
Audrey20 wrote:

Hi all,

I recently downloaded HMMER to use hmmscan locally in Linux with a Pfam database. It works great, however the ouput files are quite difficult to read quickly in my opinion...

I tried using output options such as: --tblout, --domtblout, --pfamtblout, etc. but the ouput files are still voluminous.

I would like to keep only the top hits in my output files.

I've seen that it was possible with hmmsearch so I was wondering if there was something similar with hmmscan... Ideally, I would want an output as I could find online: cf. here

If you have any suggestions, I'll gladly took them. Thank you in advance for your very appreciated help!

hmmer output hmmscan top hits • 164 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Audrey20
2
gravatar for Audrey
6 weeks ago by
Audrey20
France
Audrey20 wrote:

For anyone interested, I figured it out using this amazing resource: http://slhogle.github.io/2015/remove-duplicate-lines/ and the option --tblout of hmmscan.

I did:

hmmscan --tblout output_file.pfam Pfam-A.hmm seq_file.fasta

and then:

awk '!x[$3]++' ouput_file.pfam > MYBESTHITS.pfam

MYBESTHITS.pfam file is basically what I got online with the top hit for each protein sequences.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Audrey20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1116 users visited in the last hour