Adding a new protein to similarity matrix based on sequence alignment
1
0
Entering edit mode
23 months ago
27b99607 • 0

I have a n x n similarity matrix based on a protein sequence alignment. (I did this using protr::parSeqSimDisk (documentation here: https://nanx.me/protr/reference/parSeqSimDisk.html).

If I want to add a single new protein to this alignment/similarity matrix do I need to rerun everything? E.g., it's currently a 10000 x 10000 matrix. Does everything need to be rerun to obtain a 10001 x 10001 matrix?

I assume adding an additional sequence may effect the MSA, but I see that MUSCLE has profile-profile alignment (https://drive5.com/muscle/muscle.html#_Toc81224833). But since MUSCLE doesn't have similarity measure, a similarity matrix would need to be made via another program using that MSA output?

Is there simple/convenient way that I'm missing to add another protein?

protein similarity sequence R msa • 658 views
ADD COMMENT
2
Entering edit mode
23 months ago
Mensur Dlakic ★ 27k

In short: you can't go wrong by re-aligning, and it is possible to go wrong by concocting some kind of similarity matrix without re-aligning.

From a practical standpoint, what kind of time-saving are we talking about? Unless your proteins are unusually long (2000+ residues), aligning 10K+1 of them should take at most a couple of hours on a modern, multithreading computer. It will likely take you longer to wait for someone to offer a faster solution. And that's assuming that a faster solution would be correct, which I don't think is a given.

ADD COMMENT
0
Entering edit mode

Thanks, Mensur. I'll see how well it goes re-aligning. My issue was that it may need to be done repeatedly for work (maybe 100 times?). Or some large multiple of a few hours.

And I also see caveats for the few faster solutions I've found in the past few hours, like mafft --add (https://mafft.cbrc.jp/alignment/server/add_sequences.html).

So maybe I'll try out the faster solutions and compare with the re-aligning results, and then if they look very different give preference to the re-align? I don't want to find out later that all my results are flawed.

ADD REPLY

Login before adding your answer.

Traffic: 2828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6