I have a n x n similarity matrix based on a protein sequence alignment.
(I did this using protr::parSeqSimDisk
(documentation here: https://nanx.me/protr/reference/parSeqSimDisk.html).
If I want to add a single new protein to this alignment/similarity matrix do I need to rerun everything? E.g., it's currently a 10000 x 10000 matrix. Does everything need to be rerun to obtain a 10001 x 10001 matrix?
I assume adding an additional sequence may effect the MSA, but I see that MUSCLE has profile-profile alignment (https://drive5.com/muscle/muscle.html#_Toc81224833). But since MUSCLE doesn't have similarity measure, a similarity matrix would need to be made via another program using that MSA output?
Is there simple/convenient way that I'm missing to add another protein?
Thanks, Mensur. I'll see how well it goes re-aligning. My issue was that it may need to be done repeatedly for work (maybe 100 times?). Or some large multiple of a few hours.
And I also see caveats for the few faster solutions I've found in the past few hours, like mafft --add (https://mafft.cbrc.jp/alignment/server/add_sequences.html).
So maybe I'll try out the faster solutions and compare with the re-aligning results, and then if they look very different give preference to the re-align? I don't want to find out later that all my results are flawed.