Hello Biostar community,
I have a sequence analysis question. I am using a database of protein fingerprints (fingerprint = a set of several distinct sequence motifs excised from multiple sequence alignments). I am investigating functionally important ares (ligand binding, protein-protein, protein-ion interactions) and how they relate to these motifs that are function/structure-agnostic sequence descriptors (if they can, and how frequently they are found to lie within motifs). Although there are some interesting correlations appearing, e.g. 60% of protein-ion interaction residues fall within motifs, I would need to apply a statistical significance test to ascertain if motifs coinciding with functional residues is important or due to chance.
Practically this would go like: if 30% of a sequence is comprised of motifs, then any functional residue (and any residue picket at random on the primary sequence of the polypeptide) has a 30% probability to be within a motif by chance alone. Is there any probabilistic model (probability distribution, e.g. binomial test) that can be used for statistical significance testing?
Many thanks and apologies for the lengthy text!