Question: Pfam Based Functional Annotaion
7
gravatar for Suk211
7.7 years ago by
Suk2111.0k
state college
Suk2111.0k wrote:

I think in one of the earlier thread, Istvan has already asked about the reliability of GO annotation. I was wondering, if any of you have any experience with the functional annotation based upon the Pfam database. I am looking forward to functionally annotate a large set of peptide library and the easiest way I can think about is to do batch search of those peptides against the Pfam database.In case you guys know a better approach , kindly share it.

cheers

annotation protein • 4.4k views
ADD COMMENTlink modified 11 days ago by Biostar ♦♦ 20 • written 7.7 years ago by Suk2111.0k

minor correction it was Giovanni who asked that question

ADD REPLYlink written 7.7 years ago by Istvan Albert ♦♦ 74k
8
gravatar for Melanie
7.7 years ago by
Melanie610
San Diego
Melanie610 wrote:

I think the Pfam approach may return something useful, but you need to be careful about how you interpret your results. Pfam is primarily a tool to assign sequences to protein families. It also does a good job of recognizing functional domains. It provides information about the usual function of the domains/family members- but I do not think it should be viewed as a tool to assign function directly, and I think the Pfam curators would agree with me. It is making an assignment based on sequence similarity, and is inferring structural and functional similarity. These inferences may or may not be correct. You have several risks you need to keep in mind. Two biggies that pop out too me are:

  1. Your sequences are all shorter than most protein domains. So you may get false negatives where if you had the full sequence, you might have hit a domain, but because you only have a fragment, the similarity is too weak to produce a hit.

  2. You might get false positives because you match a domain but have a few key residues in your sequence mutated, and therefore the protein from which your sequence was derived actually does not perform the function assigned to that domain in Pfam.

You asked about direct experience. Mine is roughly 5 years old now, but it was that Pfam was one of the best tools to identify functional domains, and was a good way to annotate sequences as long as I kept its limitations in mind. However, I was working with full length sequences, not fragments. My gut instinct is that it will not perform as well on small fragments, but I have no direct experience to back me up- just my knowledge that your fragments are shorter than most domains.

Back when I did function assignment for a living, I considered it very risky to rely on one tool to make an assignment. And I never considered any assignment anything more than a hypothesis that could then be tested in the lab.

ADD COMMENTlink written 7.7 years ago by Melanie610
5
gravatar for Nicojo
7.7 years ago by
Nicojo1.1k
Kyoto, Japan
Nicojo1.1k wrote:

My experience with Pfam is limited, but I think relevant to your question.

I work on a human pathogen which has been entirely sequenced and therefore we know quite a bit about what's in it. In particular, I'm interested one pfam group (PF02009) that groups similar proteins from this pathogen.

The problem I have with the pfam group is that it groups several distinct groups of proteins. These proteins are related, I agree, however, at the level I'm comparing them (which is in detail), I would not jump to the conclusion that these proteins share the same function.

That brings me to the following comment on your question: looking for functional annotation is very vague. What detail of functional annotation are you looking for?

  • Do you want to know if these peptides belong to groups called "enzymes" or "receptors" or some kind of basic "building blocks", without any more detail?
  • Do you want to know if these peptides belong to a specific class of enzymes?
  • Do you want to know if these peptides belong to a specific sub-class of enzymes, going all the way down to the substrate specificity?

Another question I would have is regarding the length of your peptides. I recall one of my collaborators complaining about the fact that Pfam would not detect fragments that were too short. That was with Pfam2. I don't know how this is with Pfam3 though. So, you'll have to test this.

Depending on the answer to these questions (and many more) you may or may not want to only use Pfam. But in any case, Pfam could be a good start, if your peptides are not too short.

Another way that might be more relevant to short sequences would be to look at BLAST approaches (PSI- or PHI-BLAST in particular) to find what your peptides match to, and then look at the functional annotation of those hits (including whatever Pfam domains they may contain). I think this method would be more sensitive than the Pfam approach.

ADD COMMENTlink written 7.7 years ago by Nicojo1.1k

Hey Nicojo ,

Thanks for your reply. I am trying to find out what percentage of peptides in my library share a same biological process or have same molecular function .These peptides have length between 17-22 residues and Pfam was giving the annotation for a test run which I carried out a few weeks before.

ADD REPLYlink written 7.7 years ago by Suk2111.0k

I'm still a bit confused: are these peptides experimentally shown to be present in the sample as such short peptides? Or are they fragments of large proteins that have been digested and sequenced? In any case, you cannot predict the function of peptides just because that sequence is present in a full blown protein that has a function... I'd even say that it is not biologically relevant :(

But, if you have a bunch of peptides that are sequenced from a sample and you manage to map them back to a protein, then you can find out what Pfam domains that protein has and get an idea of its function.

ADD REPLYlink written 7.7 years ago by Nicojo1.1k
2
gravatar for Chris
7.7 years ago by
Chris1.6k
Munich
Chris1.6k wrote:

You might also consider to blast against Swissprot and transfer residue annotations.

ADD COMMENTlink written 7.7 years ago by Chris1.6k
1
gravatar for Eric T.
7.7 years ago by
Eric T.1.9k
San Francisco, CA
Eric T.1.9k wrote:

This review article may be helpful or at least interesting to you:

"Automated protein function prediction -- the genomic challenge" (Friedberg 2006)

Here's a relevant excerpt:

Pfam is arguably the database of choice for those seeking order within the protein sequence universe. [...] As we shall see, Pfam annotation is used by function prediction programs, either by directly querying Pfam or by using umbrella databases that include Pfam information such as InterPro. SMART, CDD, and PRODOM are other databases consisting of multiple alignments of protein domains. All these databases have proteins arranged in homologous clusters, which, when possible, are annotated. These databases are often deferred to when producing homology-based annotation transfers. It should be emphasized that the use of these databases for homology transfer should be done with caution, as they annotate proteins on a domain level. A multi-domain query aligned to Pfam, for example, should be carefully checked for mis-annotations due to domain shuffling, as mentioned eariler. Also, the 'granularity' of these databases varies. For example, a single Pfam family may contain several proteins which perform the same enzymatic reaction on different substrates.

ADD COMMENTlink written 7.7 years ago by Eric T.1.9k
0
gravatar for Noyk
7.6 years ago by
Noyk100
Noyk100 wrote:

did anybody use blast2go which map the interpro and blast hits to GO term?

ADD COMMENTlink written 7.6 years ago by Noyk100

It would probably best if you asked this question as a new one rather than adding it to the existing answers.

ADD REPLYlink written 7.6 years ago by Istvan Albert ♦♦ 74k

@noyk You'll get a detailed response from me if you post your question, as suggested by @Istvan Albert :)

ADD REPLYlink written 7.6 years ago by Eric Normandeau9.6k

ok will do that

ADD REPLYlink written 7.6 years ago by Noyk100
0
gravatar for Michael Dondrup
7.6 years ago by
Bergen, Norway
Michael Dondrup43k wrote:

Regarding the validity of PFAM predictions. Some studies (e.g. GISMO (Gene prediction), CARMA (Phylogenetic classification of environmental metagenomics samples)) have used PFAM domains as input to generate training sets for classification. The underlying assumption: sequences with hits to know protein domains have a high probability of being real protein coding regions. This is at least in my oppinion very much justified and also proven by the high precision of the resulting methods.

ADD COMMENTlink written 7.6 years ago by Michael Dondrup43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 899 users visited in the last hour