I'm embarking on learning to wrangle together protein-protein interaction data to look for putative interactions among a set of my proteins of interest. I'm looking for current tools/tutorials/references to help make this a bit easier. Even getting a hold on mitab or psixml files is new territory.
In a nutshell, I wish to:
- collect a wide range of possible PPIs (e.g. from primary databases or aggregaters or web-tools such as DIP, STRING, IMEx, iRefWeb, APID, BIPS etc.) and the sequences of those proteins
- bring in a different list of protein sequences (combined from our non-model animal and non-model fungus study organisms, basically two proteomes) and find orthologs present in that PPI dataset
- Build a network/list of possible interlogs present in my dataset, both within and between the two organisms
I see folks doing this in the literature, but the nitty gritty of how to make it happen hasn't been well fleshed out.
Anyone have tips on current pipelines and approaches to get this done? Best database combinations to use? The part I'm grappling most right now is how to bring together and search these PPI datasets. It seems like: download mitabs (or psixml), combine them, pull out unique uniprot IDs, pull down that seq data, use something like OrthoMCL to query my large protein (proteome) list against those PPI protein sequences, then go back to the combined mitab and search for shared interlogs. I'm also starting to look into using iPFAMs to help make putative PPI calls, but that probably is a separate workflow?
I can bash my way in to run things on command line (e.g. mainstream genomic and RNAseq pipelines) and handle myself in R, but don't yet have any experience with Perl or Java, which seem to be popping as I look around in this realm. Followup analyses (e.g enrichment) I think I can handle.
Any pointers would greatly appreciated, thanks!
EDIT: Found R package RpsiXML that may put things in an space where I know how to work with the data.