Mismatch in orthology results from local alignment and phylogenetic analysis.
0
0
Entering edit mode
5 weeks ago

Hello! I used Diamond to do a local alignment between Isatis tinctoria and Arabidopsis thaliana proteomes. I sorted my hits based on bit scores and chose the top hit as the best hit or ortholog in Arabidopsis for each Isatis gene. I am interested in analyzing a few of these sequences in detail and I took a list of those sequences in Isatis and the corresponding best hits or orthologs from Arabidopsis and made a phylogenetic tree. My issue is as follows: In some cases, my top hits from local alignments are also correctly represented in the tree. But, there are sequences that did not show up as top Arabidopsis orthologs, in my analysis, but still they have corresponding orthologs in Isatis, and form distinct clades in the tree. My newick file is as follows:

(((AT5G63590:0.08693,(Isati.0715s0020.v1.1:0.00055,Isati.6782s0008.v1.1:0.02950)0.967:0.07033)0.963:0.06333,(Isati.6782s0006.v1.1:0.10572,(Isati.3013s0018.v1.1:0.00797,(Isati.0715s0021.v1.1:0.01039,Isati.6782s0007.v1.1:0.01396)0.244:0.01325)1.000:0.13392)1.000:0.13123)1.000:0.15223,((AT5G63580:0.45733,(**(AT5G08640:0.04952,Isati.4752s0005.v1.1:0.03349)**0.881:0.06364,(AT4G22880:0.76381,AT3G51240:1.20999)0.991:0.46263)0.857:0.05249)0.796:0.02081,((Isati.1778s0010.v1.1:0.04721,(Isati.0382s0006.v1.1:0.16573,(Isati.1317s0015.v1.1:0.00939,Isati.1778s0011.v1.1:0.03979)0.655:0.01919)0.965:0.09734)1.000:0.29822,**(AT5G63595:0.40421,(Isati.6782s0005.v1.1:0.43539,(Isati.5352s0003.v1.1:0.08806,(Isati.6569s0008.v1.1:0.09523,(Isati.0644s0012.v1.1:0.01209,Isati.7517s0001.v1.1:0.03264)**0.837:0.01128)0.064:0.00054)1.000:0.22698)0.406:0.00411)0.734:0.02980)0.858:0.04745)0.853:0.02420,(AT5G63600:0.46052,(AT5G43935:0.18660,(Isati.6782s0004.v1.1:0.04278,(Isati.3712s0003.v1.1:0.00053,(Isati.0715s0022.v1.1:0.00055,Isati.3013s0017.v1.1:0.01651)0.797:0.00347)0.970:0.04268)1.000:0.17695)0.985:0.12234)0.895:0.05119);

Let me explain with an example. Consider the clade of AT5G63595 (represented in bold). This Arabidopsis gene did not show up as a top hit for any of the Isatis genes in my forward local alignment analysis. While the gene Isati.6782s0005.v1.1:0 showed AT5G08640 to be the best hit in Arabidopsis in my analysis.

I also tried to globally align this Isatis gene and the two Arabidopsis sequences to gauge the global alignment identity score using Emboss needle. In this analysis as well, the Isatis gene shows greater identity based on global alignment to AT5G08640 and not AT5G63595 as shown in the tree.

Next, I also calculated the pairwise evolutionary distances between the sequences using Poisson correction model. In this analysis as well, the Isatis gene shows lower evolutionary distance to AT5G08640 than AT5G63595.

Can someone help me understand what is happening here and why are the tree results different despite higher global alignment score and lower evolutionary distance with AT5G08640 and not AT5G63595?

Is there a way, I can confidently identify best hits or orthologs using my local alignment results?

Thank you!

alignment local Phylogeny global • 292 views
ADD COMMENT

Login before adding your answer.

Traffic: 1136 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6