I previously asked a similar question in a different context on this site which can be found here:
As stated in that post I am very new to phylogenetic analysis and would really appreciate some advice. I was told that I should provide more information on what I'm trying to accomplish and how I am trying to do it. I am working on constructing phylogentic trees for specific paralog families in the Mycobacterium tuberculosis (MTB) genome. It is not known how deeply most of the paralog families I am interested in are conserved. However, I have been using the protein coding nucleotide sequences of my paralog family members in blast (nblast) queries to find similar sequences for use as outgroups and have found many homologous sequences in and outside the Mycobacteria.
I have been advised to use the ML method to produce my tree topology and have been using MEGA6 to do just that. A guiding principle, as I understand it, for outgroup choice is that the outgroup sequence should be related to the members of the gene family but not more so than any of the family members are to each other. With that in mind, I am unsure how to proceed. How long do two bacterial genomes need to develop apart from each other to be considered 'distant enough' for one to be used as an outgroup? Would this be related to how old the splits in the paralog families are and if so how do I go about determining that?
For instance, I have access to a number of Mycobacterium Smegmatis and Leprae orthologs to some of my genes of interest. I began by trying to use some of these as outgroups but they produced inconsistent results. Could this be because they are too closely related to MTB?
Lastly, and I believe this may be a problem of understanding, I have observed some clustering, wherein outgroups are found clustered together with paralogs in terminal branches when I root on the midpoint, with apparently short distances between the outgroup and one of the family members. Is this not really a concern since ML trees are naturally unrooted or is it an indication that that paralog sequence and the outgroup are "too closely related"?
Thank you for your patience.