Hello all. I'm trying to find the GO terms for a list of Viral gene names, originating from different genomes. I cannot think about an easy, straightforward way to do so... Is it possible that the only way to do this search is to blast the gene sequences, and then to re-assemble the list? Thanks for your attention H.
As suggested by Pierre, if you have the accession numbers, you could retrieve GO annotations. If you don't have GO annotations available for your genes, you could use sequence based tools for indirect GO associations. You could try tools annotation tools provided here.
A related question is here.
As far as I know there are not any GO term annotations for viral genomes. Some viruses such as Orthopoxviridae will have many host gene homologs meaning you can easily use BLAST to assign GO terms. Genes that lack any annotated homologs will be left unannotated. A word of warning is to not confuse
highly homologus domains with
homologs, you'll want to ensure that there is a sufficiently high level of identity with the length of the gene. A nearly identical CDS is one thing, but inferring function from a small but conserved domain is asking for trouble. Also use
tblastn, since it will compare at the amino acid sequence level. Typically codon usage bias (CUB) will be very different between the virus and the reference species, enough so that you can easily miss things.
Even then you have to remember that even when viral genes are highly homologous to host genes, it is very hard to say that they have the same function. They might interact with the same things, but the temporal and spatial features of those interactions are totally different between the virus and host homologs. The viral gene could have totally different localization and therefore would be open to interact with things the host gene would never see.
I'm curious to know what your intended goal is here. If you are simply interested in functions of the viral genes in terms of viral natrual history, I would just grind the literature and make your own annotations. Within the virus (i.e.
not interacting with host genes), there will be a very limited and concise function of each gene. If you're looking to include what host genes do, I would suggest you compile a list of host genes it interacts with. From there you could place a sign on the interaction (activates/upregulates/etc vs disables/deactivates/etc). For example CPXV prevents trafficing of MHC-1, so a negative sign (disables) on the interaction. MHC-1 presents antigens. So CPXV203 is involved in the disabling of MHC-1.
There really isn't a simple way to do this, aside the few obvious roles that genes play in the virus life cycle, everything will be defined in terms of what host components it interacts with and how it interacts with them. I would argue that GO via BLAST isn't the best way to provide functional annotations for viral genes.