Is it acceptable to pool the annotations from the various sources InterProScan offers, and annotate a sequence with a subset of these?
For example, if I have something like so:
id annot src start stop
seq1 dom1 Pfam 100 120
seq1 dom1a CDD 101 128
seq1 dom2 Pfam 60 80
Is it acceptable to take dom1a from CDD and dom2 from Pfam, and leave out dom1 from Pfam (since it's redundant with dom1a)?
The purpose is to have everything recognizable on the sequence annotated while not having any redundant annotations.
And yes, the annotations I have in mind are positional and functionally redundant--for example, the same PAS domain annotated by both
PfamandCDD.This is what I think I'll go with.
The main reason any of this is even coming up is because
Pfamis missing a few crucial annotations that are covered by other databases, but using everything at one go then makes annotations of domains elsewhere on the sequence redundant. E.g., for a sequence that looks like this:---dom1---dom2---dom3---Pfamannotatesdom1anddom3.CDDmeanwhile annotatesdom2but alsodom1anddom3, makingdom1anddom2redundantly annotated in the process. Because of the fact thatPfamis useful elsewhere with other sequences in the analysis, I am loathe to dropPfam, and just want to "paint in" the missing domains from other databases. Hence my question.