Mafft Output Distance Matrix
2
3
Entering edit mode
11.4 years ago

The MAFFT doc includes a "tips" section which indicates that it is possible to get a distance matrix as output instead of or in addition to the alignments. However, the entry for distance matrices is just a stub. I will email the site admin to find out more, but does someone here already knows how to do this? MAFFT can also output trees, but I am very interested in seeing the full distance matrix.

Here is a link to the tips where I found the tease about the matrix: http://mafft.cbrc.jp/alignment/software/tips0.html

phylogenetics alignment distance • 7.5k views
9
Entering edit mode
11.4 years ago

Spelunking a bit in the MAFFT source code, I found the --distout option. It will create a ".hat2" file with the distance matrix.

0
Entering edit mode

Thanks. Worked for me. Now I just have to patch the Biopython wrapper to let me use this option.

0
Entering edit mode

+1 for taking the time to read the source!

0
Entering edit mode

Hi, I ran a mafft job to align a reference sequence to an alignment. I received an error: Loading 'hat2n' (aligned sequences - new sequences) ... 115628 != 11562 hat2 is wrong. Have you ever encountered similar error. I am not sure what this means. Thanks

5
Entering edit mode
4.5 years ago
Ghoti ▴ 80

Apologies for updating an old topic, but this page is one of the first results when searching Google for "MAFFT distout". I'd like to share some information on the distout feature that comes directly from Dr. Katoh, author of MAFFT:

The --distout flag is just for my personal use and there is no document. I think it's better and safer for you to use your code to assess the similarity or distance of sequences from a given alignment. However, I think I can explain this option. Please check if it'll be useful for your purpose or not:

The --distout option outputs distances that are used for building a guide tree. Note that the distances are computed before building an MSA, not computed from the MSA. The distances are converted from pairwise scores with this equation,

Distance(i,j) = 1 - Score(i,j) / min( Score(i,i), Score(j,j) )

as explained in our old paper, Katoh et al 2002. Score() can be computed by various ways (without MSA). In Katoh2002, Score(i,j) was the number of 6mers that are shared by sequences i and j. If the --localpair flag is set, then Score(i,j) is the pairwise alignment score between sequences i and j. The pairwise alignment is not always the same as the resulting MSA even if the number of the input sequences is two.