Categorizing Protein Families?
2
3
Entering edit mode
13.3 years ago
ilovepython ▴ 150

Following on my previous question regarding discovering protein homology. After finding sequences of interest against a profile of a family, I want to determine whether these sequences can be categorized into this family or not. How can one score proteins against each other so that they can be grouped as so?

Originally, this "family" was determined via simple statistics (pairwise scoring via z-score and alignment calculated from shuffling of these sequences), although I'm not convinced this is a sophisticated enough to determine membership. Therefore I'm looking for a more sophisticated method of scoring this. There are important secondary structures that I am adding to my scoring function, but beyond this, I can't seem to find much on google regarding this type of scoring.

protein homology scoring scoring • 2.8k views
ADD COMMENT
5
Entering edit mode
13.3 years ago

If you are interested in including secondary and tertiary structure for categorisation I strongly suggest you look at the methods used by Superfamily which is SCOP based and Gene 3D which is CATH based

ADD COMMENT
1
Entering edit mode

Well, if you look at the Superfamily database, you'll find that it is in fact a collection of HMMs just like Pfam, SMART, and InterPro. The difference lies in how they made the multiple sequence alignments.

ADD REPLY
1
Entering edit mode

Knowing the methodology of how the MSAs are created is indeed critical; many people overlook these two as they are derived via structure rather than function. Note that although gene3d and superfamily are interpro member databases, you need to check interpro's release notes to see how many of the hmms have been integrated. (only about half of Gene3D has been so far). Hence I recommend going to the site directly to get the latest data.

ADD REPLY
4
Entering edit mode
13.3 years ago

Looking at the accepted answer to your previous question, I wonder why simply running hmmsearch with your custom HMM would not do the job? Building an HMM based on a manually checked multiple sequence alignment and then using it for searching would be the standard way to identify members of a protein family.

Perhaps more important: if you make your own scoring scheme, how will you check if it works better than just using hmmsearch? Surely, if you choose to not use the well tested approach, someone will (and should!) ask you to present evidence that your solution is an improvement.

ADD COMMENT
0
Entering edit mode

Makes sense! I thought there needed to be more rigorous tests to determine membership, but it seems the hmmer incorporates sophisticated ai techniques to determine this.

ADD REPLY

Login before adding your answer.

Traffic: 2739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6