I have been posed the following problem by a collaborator and I could use some advice on how to approach it:
The collaborator works with a non-model organism. He is interested in how a particular protein binds to other proteins. He has done a pulldown with his protein followed by Mass Spec, and has given me the gene ids from the mass spec hits.
In model organisms, there is a well-characterised domain for binding to his protein of interest which has an entry in Pfam. I have not been able to find this domain in any of his hits - anywhere else in the genome of his organisms or indeed anywhere else in the wider clade. So the question is, is there some other domain that facilitates interactions with his protein of interest in this organism?
I have are full length amino acid sequences from around 100 proteins from the same organism. There is no indication of where a potential domain might be in each protein. It is probable that some (even many) of the proteins don't even bind the protein of interest directly as they may be part of a larger complex or just sticky.
I am used to looking for divergent members of a gene family across organisms or starting with a domain that I have some structure for, but none of the usual approaches are working here - I don't feel like I have enough information to start with.
I feel like I need some way to narrow down the search. I have tried several variations on all vs all local alignments (blastp, psiblast, jackhmmer) in an attempt to find recurrent hits but nothing useful has come out of this. Out of desperation I tried a very naive MSA approach too, which unsurprisingly didn't yield anything.
Does anyone have any bright ideas?
Thanks! Yes I will have a play with MEME.
What I meant by the naivety of MSA is that at the moment I don't have any idea what a potential domain might look like or where it could be. I think I really need to try to refine domain boundaries first, even if not precisely, and then start looking at MSAs. In any event, the output I got with what I had was a mess!