Closed:How to parse HMMsearch domtblout, resolve overlaps & return PDA?
0
0
Entering edit mode
5.0 years ago
Anand Rao ▴ 630

Hi folks,

What is current state-of-the-art with regards to parsing Pfam domain matches (reported by hmmscan or hmmsearch or pfamscan.pl) into the linear order of protein domains aka protein domain architecture (PDA)?

pfamscan.pl returns non-overlapping domains - so conversion of those results to PDA is relatively simple.

But for hmmscan or hmmsearch results in domtblout format, there can still exist overlaps amongst matches.

These overlaps may be for domains in the same Pfam clan, or in different Pfam clans.

The matches to different domains are going to have different E-values and/or scores.

Operating under the principle that any given amino acid residue can belong to only one domain assignment (is that correct?), is there a ready-to-use parser that will take domtblout and resolve overlaps to something like this below (made-up example):

ProteinID#1<tab>Pfam12234<space>PF06322<space>PF10010
ProteinID#2<tab>Pfam98734<space>PF00912<space>PF10010<space>PF10010<space>PF10010
ProteinID#3<tab>
ProteinID#4<tab>Pfam00184<space>PF06222

I've contacted Pfam help desk about this, but there's been no response after > 2 weeks, and a follow-up reminder. So I am turning to forum members here for help . Thank you, in advance.

PDA overlap resolve domain Pfam • 144 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6