How come BLAST result of my sample sequence is showing top sequences with 97-100% similarity but the bootstrap value on the branch point of my sample is only 30/100 in the phylogenetic tree? It's only the branch point of my sample with such low bootstrap value, the rest of the branch points have high bootstrap value. How do I interpret the result?
Bootstrap values show support for a particular branch configuration. Generally speaking, branch support is not correlated with percent identity of the entries. For example, it is possible to have 3-4 entries with >95% identity to each other, all on the same branch and with poor bootstrap support. It is also possible to have 3-4 entries with <80% identity to each other, all on the same branch and with high bootstrap support. Until you provide more details and the tree itself, it is impossible to answer this question except in general terms.
It is tough to get a clear-cut tree with sequences that are so similar as in your case. Specifically, you have three sequences on top of which two are identical, and the third one has 1 substitution in 1432 residues. So the two identical sequences definitely need to be next to each other, since there is nothing that is more similar to either of them than each other. Next, during the reconstruction the program has to "decide" which of the two identical sequences (MT7 and ASJC01000029) is closer to AMSH01000114. That, of course, is a trick question, since both of them are identical and therefore equally close to AMSH01000114. A possible solution to that is to slot AMSH01000114 in between the other two sequences, but make the AMSH01000114 branch long so that the other two are still closer to each other than to AMSH01000114. Those are the three topological possibilities, and they are roughly equally possible given a minuscule differences between the three top sequences. That's why you get a bootstrap value of 30.
A simple solution that will fix that branch is to remove ASJC01000029 from the tree, because it is identical to your query and therefore unnecessary. In that case the relationship between MT7 and AMSH01000114 will be unambiguous and the bootstrap support will be high.
Oh thank you so much, I understood. I'll try that. You really helped me a lot. Also, one more question what 100% similarity means in case of 16S rRNA? Does it mean they're exactly the same?
I don't know what exactly 100% similarity means on that BLAST platform you are using, because the words similar and identical are only similar but not identical. Sorry, couldn't resist :-))
A safe bet is that they man identical because that is what is typically reported, so it likely means they are exactly the same. You can see below how BLASTn reports its matches and there are plenty of identical sequences to ASJC01000029, so presumably to your sequence as well.
Hi Sir, I actually used ezbiocloud platform for BLAST. These are all 16S rRNA genes of the Bacillus genus. My sample is labeled as "MT7" that I got sequenced here. If you could check these links it would be really helpful for me. 1) Tree: https://docs.google.com/presentation/d/1UnQCzhJtKOjBz0gQwgV8wQnSB0_7MHL8R853w710pGo/edit?usp=sharing
2) sequences: https://drive.google.com/file/d/1OzEr9mBjMpDayEV9G54SyGBzRxTxsU7E/view?usp=sharing
Thank you so much for the response.
It is tough to get a clear-cut tree with sequences that are so similar as in your case. Specifically, you have three sequences on top of which two are identical, and the third one has 1 substitution in 1432 residues. So the two identical sequences definitely need to be next to each other, since there is nothing that is more similar to either of them than each other. Next, during the reconstruction the program has to "decide" which of the two identical sequences (MT7 and ASJC01000029) is closer to AMSH01000114. That, of course, is a trick question, since both of them are identical and therefore equally close to AMSH01000114. A possible solution to that is to slot AMSH01000114 in between the other two sequences, but make the AMSH01000114 branch long so that the other two are still closer to each other than to AMSH01000114. Those are the three topological possibilities, and they are roughly equally possible given a minuscule differences between the three top sequences. That's why you get a bootstrap value of 30.
A simple solution that will fix that branch is to remove ASJC01000029 from the tree, because it is identical to your query and therefore unnecessary. In that case the relationship between MT7 and AMSH01000114 will be unambiguous and the bootstrap support will be high.
Oh thank you so much, I understood. I'll try that. You really helped me a lot. Also, one more question what 100% similarity means in case of 16S rRNA? Does it mean they're exactly the same?
I don't know what exactly 100% similarity means on that BLAST platform you are using, because the words
similar
andidentical
are only similar but not identical. Sorry, couldn't resist :-))A safe bet is that they man
identical
because that is what is typically reported, so it likely means they are exactly the same. You can see below how BLASTn reports its matches and there are plenty of identical sequences to ASJC01000029, so presumably to your sequence as well.Oh I understood now, thank you so much once again. You really helped me a lot.
Did the branch support go to 100 once you removed the identical sequence?
Yes, thank you. I also got to know that the 16S rRNA gene could be identical in some species. Thanks again!!!
https://drive.google.com/file/d/1_H92Gj8l2Jim7dib6ZgVjQqY0KwAqw6U/view?usp=sharing
Also, I used Maximum likelihood method and K80 evolutionary model for tree construction.