Question: How to parse HMMSCAN output to enable comparison of domain architecture for several proteins
12 months ago by
peter pfand70
Dear community,

I have a FASTA file with several (co-)ortholog proteins, from 2 different species, whose domain architecture I want to know (1). Next, I would like to get, for each protein, a sequence of likely true domains (2), and then, I'd like to compare such domains (3): the presence/absence and the order of appearance.

steps 1 & 2: I can do this manually for a small set of proteins in a FASTA file, but it turns out too tedious when I have 1000 FASTA files. Does anyone know any parser/tool to retrieve the significant domains for every protein from hmmscan output?

step 3: I have found metrics such as WDAC (Weighted Domain Architecture Comparison, see WDAC), ADASS (alignment-free domain architecture similarity search, see ADASS) and DA-score (Domain Architecture similarity score, see DA-score), but I couldn't manage to find any benchmark/comparison of those three or others. Does anyone know which method of those three is the most accurate/best or whether there are others?

I am a quite newbie working on this and feel a bit lost.

Thanks a lot in advance

ADD COMMENTlink written 12 months ago by peter pfand70
