Hello everyone,
I have a question regarding these two terms. I understand that eggNOG is a phylogeny-based ortholog protein finder, while Diamond is a tool used to search for homology between sequences. As you know, eggNOG provides also taxonomy information for proteins, albeit not at very low taxonomic ranks. My goal is to achieve consistency between Diamond and eggNOG results.
For instance, if Diamond identifies 100 eukaryotic proteins, eggNOG might identify more, say 200. However, some proteins assigned as eukaryotic by Diamond may not be classified as eukaryotic orthologs in eggNOG. This discrepancy also occurs in cases involving bacteria. For instance, Diamond might classify a protein as bacterial, while eggNOG identifies it as a eukaryotic ortholog.
While I understand that orthology can be established between species across different genera, families, and phyla, I'm questioning the reliability of classification at the kingdom level. Horizontal gene transfer (HGT) is a possibility, but I prefer not to make this assumption based solely on my data.
My question is: which source of taxonomic information might be more reliable for the protein taxonomy? In other words, if eggNOG identifies a protein as eukaryotic, should it belong to certain eukaryotic lineages?
Thank you.
eggNOGis a database,eggnog-mapperis the tool that is used to construct theeggNOGdatabase (if I recall correctly).eggnog-mappercalls upon fast aligners such asDiamondinternally to (first) establish homology between sequences and (then) identify/classify orthologs through additional analyses. You are unlikely to establish parity in results betweenDiamondandeggNOGoutputs as a result of this consideration.It would but this need not mean functional counterparts to it cannot be found elsewhere in the tree of life if that's what you're getting at.
Sorry for the confusion earlier. As you mentioned,
eggnog-mapperis a tool used with theeggNOGdatabase, and you can specify the search parameter as eitherhmmerordiamond. In my case, I used the diamond search option ineggnog-mapperagainst theeggNOGdatabase. Besides, I only useddiamond blastpagainstnrdatabase and now I comparediamond blastpandeggNOG result. Given this scenario, which taxonomy classification from the protein results would be more reliable? Or If I use simply eggNOG taxonomy to talk about the protein taxonomy, would that be correct?Regarding your statement,
Yes, that's generally correct. However, regarding the protein I have, it belongs to eukaryotes according to eggNOG taxonomy, doesn't it? I actually want to be sure If I can use ortholog taxonomy information to assign the protein' taxonomy.
The
eggNOGresults are bound to be more reliable in the comparison you mentioned becauseeggnog-mapperconducts additional analyses to infer the type of homology established between the sequences by the aligner as indicated in Fig. 1 (C) here ( https://doi.org/10.1093/molbev/msab293 ).The
eggNOGtaxonomy here is "protein" (or at least amino acid sequence based) taxonomy here, so yes.You are, I am guessing, essentially operating in a situation wherein you are attempting to establish affiliation and annotation for a sequence on the basis of the "best" knowledge and tooling available to you. If
eggNOGindicates that your sequence falls within a family found (hitherto) only in eukaryotes and the host species of this sequence is also an eukaryote, I do not really see any reason to suspect mis-affiliation and/or mis-annotation here.If you have no other information available to you (which I guess is the case here), what else can you do here?