I am trying to develop a procedure for assessing the reliability of proteins derived from a genome annotation analysis. One thing I'd like to do is search the annotated protein for protein domains, with the idea being that proteins containing known domains are more likely to be "reliable". I was thinking of using the InterPro DB for that, specifically InterProScan for running the search. My questions are:
- Does this idea make sense to you?
- Should I limit my search in some way? For example, maybe only search for "functional" domains (e.g. "Ribonuclease H-like superfamily", and not "Retrotransposon gag domain"), or specific member DBs. What would you recommend for this purpose?
- Are there any specific terms that I should beware of? e.g. "Domain of unknown function".
- Anything else you would add or do differently in this analysis?