Using protein domains for annotation validation
Entering edit mode
19 months ago
liorglic ★ 1.2k

I am trying to develop a procedure for assessing the reliability of proteins derived from a genome annotation analysis. One thing I'd like to do is search the annotated protein for protein domains, with the idea being that proteins containing known domains are more likely to be "reliable". I was thinking of using the InterPro DB for that, specifically InterProScan for running the search. My questions are:

  1. Does this idea make sense to you?
  2. Should I limit my search in some way? For example, maybe only search for "functional" domains (e.g. "Ribonuclease H-like superfamily", and not "Retrotransposon gag domain"), or specific member DBs. What would you recommend for this purpose?
  3. Are there any specific terms that I should beware of? e.g. "Domain of unknown function".
  4. Anything else you would add or do differently in this analysis?

Thank you!

Interpro annotation domain InterproScan • 443 views
Entering edit mode

Simply detecting a well known protein domain is probably not a good indication of quality of the annotation. If you extend this to comparing the protein domain composition of the annotation to known proteins then it's a form of sequence similarity measurement. If there are already known proteins for this genome or for related species, you could look more directly for sequence similarity between your annotations and previously annotated proteins. There are plenty of genome annotation papers out there, look at how they estimate quality of their annotations.


Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6