Hello,
I'm writing a small program that takes in a list of Uniprot gene names (and organisms of interest), and returns a visual representation of the GO terms associated with that list of genes in the specified organisms. I've run into an issue though. What if Uniprot doesn't have the GO terms for a gene in the specified organism? It might have the GO terms for that gene in other organisms, but not the one specified. So I wanted to ask, could I take the GO terms from the organism in the search results, that's the closest relative to the one of interest? If so, how do I determine which organism is the closest? I'm not very familiar with any phylogenetic resources where I give it a list of organisms and it returns a phylogeny of those organisms. Or are there easier ways to get the GO terms?
I appreciate the help. Also, if I do end up releasing this software, how do I reference this website? Or do I reference a specific user whose answer I used?
Transferring annotations between orthologs is commonly done. You don't have to limit yourself to a closely related species but can transfer annotations from any ortholog. For example yeast and Drosophila annotations are routinely used to annotate human genes. Depending on which organisms you're dealing with, you can use Ensembl Compara as a source of ortholog relationships.
@Jean-Karim Heriche Thanks for the reply. So for example, the mgsA gene in E coli would have similar annotations as the mgsA gene in humans?
I wouldn't know without seeing a phylogenetic tree but I certainly wouldn't make the inference based on name. Also I don't think one can infer orthology for groups as distant as mammals and bacteria. If interested in bacterial genes then look for other bacterial species. But generally, this is the idea: given a list of orthologous genes in multiple species, one can pool the annotations so that the whole group shares the same annotations.
Thank You! The way I was thinking of doing it is to pool all the GO terms from all results, and take the top 3 or 4, as a representative annotation for the gene. For example, for the mgsA gene has results in cows, humans, myobacterium, sacchromyces, etc. So take all the GO annotations from all those results and take the top 3 most common ones. Another way I was thinking was to take the smallest umbrella term that encompasses most of the results, and to take that as the representative for a given gene.