My problem is very simple: I need a software which is able to give a phylogenetic distance matrix for some species for which I have the 16S rRNA sequences collected. Initially, I cannot assume that the species appear in a certain database, so I cannot use precomputed alignments.
Initially I wanted to use ClustalW for the alignment, but I read about rRNA secondary structure, and I found a paper from the SILVA team which clearly specifies "don't use software like Clustal, Mucle or similar", arguing that those aligners were only relying on the sequence data and not in the secondary structure of this rRNA (and therefore they don't perform very well). I found a very nice tool, which has been discussed in this page called ssu-align (which uses that secondary strcture). It is made by the people at the Eddy lab (those who made HMMer), and it is pretty fast. After using ssu-align (and masking the alignment), three alignment files are produced, one for (possibly) bacterial sequences, archaeal sequences and other for eukaryot sequences (in my dataset I have sequences from all the three domains). My problem here is the combination of the three alignments to be able to compute the phylogenetic distance among all the species:
- Does it makes sense to combine them in a single alignment? If it does, what would be the proper way to do this? In the documentation of ssu-align seems that the program ssu-merge was designed to do this, but I obtain an error.
- If the tool I've choosen is not suitable for this task, would you please recommend another option (preferably a free software tool, and not a web server).
- I assumed that you need a single multiple alignment file to infer phylogenetic distance, but maybe this is not true. Is it possible to get the distance matrix having three alignment files instead of just one?
- Obviously, if someone can propose another methodology or comment for my pipeline starting with just rRNA sequences, I would really appreciate it.
Note that I don't need to have a perfect distance estimation, I just need a rough estimation which, in a way, resembles the reality (but which could be accepted by a reviewer :) ).