I have a FASTA file with several (co-)ortholog proteins, from 2 different species, whose domain architecture I want to know (1). Next, I would like to get, for each protein, a sequence of likely true domains (2), and then, I'd like to compare such domains (3): the presence/absence and the order of appearance.
steps 1 & 2: I can do this manually for a small set of proteins in a FASTA file, but it turns out too tedious when I have 1000 FASTA files. Does anyone know any parser/tool to retrieve the significant domains for every protein from hmmscan output?
step 3: I have found metrics such as WDAC (Weighted Domain Architecture Comparison, see WDAC), ADASS (alignment-free domain architecture similarity search, see ADASS) and DA-score (Domain Architecture similarity score, see DA-score), but I couldn't manage to find any benchmark/comparison of those three or others. Does anyone know which method of those three is the most accurate/best or whether there are others?
I am a quite newbie working on this and feel a bit lost.
Thanks a lot in advance