Pipelines/packages for protein domain annotation?
1
0
Entering edit mode
6 weeks ago
John • 0

Hi, I'm currently generating new bacterial genomes and I want to identify all ankryn domain containing proteins (and the number of repeats they have). What are some approaches I can use for this?

Due to the scale I don't want to use web tools, and I've been looking at the hmmer documentation which looks quite complicated to parse. So I'm hoping there is a simpler way.

Ideally, I'd like to process a gbff file, but I can always convert between file types.

Thank you!

Note: I can't just extract this from the annotation information as bakta, the package I'm using doesn't provide it for all cases.

annotation protein domains • 209 views
ADD COMMENT
1
Entering edit mode
6 weeks ago
Mensur Dlakic ★ 27k

What exactly is so difficult in the HMMer documentation? Individual preferences vary, but I always thought that HMMer had one of the cleanest manuals out there.

There is no simpler way to annotate the presence of a single domain in a protein database than to use hmmsearch. As you found out, using automatic annotation tools like prokka or bakta has its own difficulties. What could be simpler than:

hmmsearch -E 0.01 -o output_file.txt ank.hmm protein_database.faa

In order, I am setting an E-value threshold here, an output file with search results, the HMM name that will be used for searching and a protein database. There are other options, such as setting a larger number of CPUs to speed up the search, but the above command is all it takes. What's left for you is to find an ankyrin domain HMM and off you go.

ADD COMMENT
0
Entering edit mode

Brilliant, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2218 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6