Question: Best way to identify proteins within a family?
0
gravatar for Solowars
6 months ago by
Solowars40
Brazil/Porto Alegre/UFRGS
Solowars40 wrote:

Dear community,

I'm interested in retrieving deep homologies for a number of genes that belong to a protein superfamily (let's say, GPCR). For it, one of the strategies was to perform HMMER searches, using an alignment or a HMM created from an aligment. For what I have read, many people use specific protein domains in order to determine which proteins found are true matches. In my case, my proteins don't have a specific domain characterizing them, and share a common domain with the rest of the family (e.g. the 7TM domain). Therefore, though I get a good number of good matches (proteins previously identified in the database as an homolog of my query genes) in my search, a number of other proteins from the family appear too, which somehow hampers determining if uncharacterized proteins in my search are true matches or not. I tried to improve this approach by using different domain architectures, but I'm still dealing with the problem of retrieving false matches. I tried to play around with E-values and Bit scores might help, and using a different kind of search (e.g. iterative search), but I haven't found a fully satisfactory way to tackle the issue.

Any thoughts?

Thank you!

ADD COMMENTlink modified 6 months ago • written 6 months ago by Solowars40

the sentence " In my case, my proteins don't have a specific domain characterizing them, and share a common domain with the rest of the family (e.g. the 7TM domain). " is confusing. You are going to find a "subfamily"? Btw, I think you have to draw phylogenetic trees in that case.

ADD REPLYlink written 6 months ago by fishgolden280

Well, let's say that we have a big receptor family, like GPCRs, which contains receptors for a broad array of neurotransmitters. However, I'm interested in a specific subgroup (receptors of a specific neurotransmitter), and these receptors don't have a specific domain characterizing them, other than the 7TM (7-transmembrane) domain, which is common to all GPCRs. I thought about building phylogenetic trees, but the amount of GPCRs and species matching a given HMM query is way too big to build a tree, so I'm trying to improve my filtering (either by playing around with search thresholds or improving my query) in order to reduce the number of putative proteins to a more bearable number. I know that there are several strategies that I could use in order to identify proteins, below the domain level, such as fingerprints or Interpro protein family predictions, but I think those can still introduce errors (e.g. misassigned proteins), so that's why I ask whether there's a different strategy that I'm not aware of yet that works better.

ADD REPLYlink written 6 months ago by Solowars40

I think checking boot strap values of phylogenetic trees and checking conservation of important residues in alignments are essential for such a detail analysis. (There are some automatic approach such as orthomcl (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC403725/) but as I haven't tried this software, I may not get expected result. just FYI.)

ADD REPLYlink written 6 months ago by fishgolden280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour