I have been doing evolutionary genomics with a few newly sequenced vertebrate species; they have RefSeq annotations supported by RNASeq data, but they are not model species or really common, and some are really remote in evolutionary terms (like the Elephant Shark, the Lamprey). I would like to get the Interpro domains of their proteins and find orthologous genes if possible.
The various papers in which the authors published have all done the same kind of analysis, using OrthoMCL/ Blast2GO /InterPro and others to get that kind of information as part of their annotation process. The problem is that very few bothered to publish their complete data, such as which particular orthologous genes they retained for substitution rate calculation. All I have are methods and results (which is nice but not enough).
Are there any database tracking all the proteins from a given genome, giving their gene family, GO terms and orthologous genes in other species?
I could also rerun some InterPro scan/OrthoMCL jobs (lengthy option) or try to use some of the reference organisms to which they were aligned as a common denominator (quicker option, but I will lose possible orthologous genes). I can also contact the authors, but I would still like to know if such a ressource exists.