Hi folks,
What is current state-of-the-art with regards to parsing Pfam domain matches (reported by hmmscan or hmmsearch or pfamscan.pl) into the linear order of protein domains aka protein domain architecture (PDA)?
pfamscan.pl returns non-overlapping domains - so conversion of those results to PDA is relatively simple.
But for hmmscan or hmmsearch results in domtblout format, there can still exist overlaps amongst matches.
These overlaps may be for domains in the same Pfam clan, or in different Pfam clans.
The matches to different domains are going to have different E-values and/or scores.
Operating under the principle that any given amino acid residue can belong to only one domain assignment (is that correct?), is there a ready-to-use parser that will take domtblout and resolve overlaps to something like this below (made-up example):
ProteinID#1<tab>Pfam12234<space>PF06322<space>PF10010
ProteinID#2<tab>Pfam98734<space>PF00912<space>PF10010<space>PF10010<space>PF10010
ProteinID#3<tab>
ProteinID#4<tab>Pfam00184<space>PF06222
I've contacted Pfam help desk about this, but there's been no response after > 2 weeks, and a follow-up reminder. So I am turning to forum members here for help . Thank you, in advance.