Adding a new protein to similarity matrix based on sequence alignment
Entering edit mode
4 weeks ago
27b99607 • 0

I have a n x n similarity matrix based on a protein sequence alignment. (I did this using protr::parSeqSimDisk (documentation here:

If I want to add a single new protein to this alignment/similarity matrix do I need to rerun everything? E.g., it's currently a 10000 x 10000 matrix. Does everything need to be rerun to obtain a 10001 x 10001 matrix?

I assume adding an additional sequence may effect the MSA, but I see that MUSCLE has profile-profile alignment ( But since MUSCLE doesn't have similarity measure, a similarity matrix would need to be made via another program using that MSA output?

Is there simple/convenient way that I'm missing to add another protein?

protein similarity sequence R msa • 245 views
Entering edit mode
4 weeks ago
Mensur Dlakic ★ 19k

In short: you can't go wrong by re-aligning, and it is possible to go wrong by concocting some kind of similarity matrix without re-aligning.

From a practical standpoint, what kind of time-saving are we talking about? Unless your proteins are unusually long (2000+ residues), aligning 10K+1 of them should take at most a couple of hours on a modern, multithreading computer. It will likely take you longer to wait for someone to offer a faster solution. And that's assuming that a faster solution would be correct, which I don't think is a given.

Entering edit mode

Thanks, Mensur. I'll see how well it goes re-aligning. My issue was that it may need to be done repeatedly for work (maybe 100 times?). Or some large multiple of a few hours.

And I also see caveats for the few faster solutions I've found in the past few hours, like mafft --add (

So maybe I'll try out the faster solutions and compare with the re-aligning results, and then if they look very different give preference to the re-align? I don't want to find out later that all my results are flawed.


Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6