Protein Sequence Alignment
2
0
Entering edit mode
9.2 years ago
pwg46 ▴ 540

Say I am given a protein U1 from the Uniprot database. And, according to UniProt's mapping data file, U1 maps to R1 in RefSeq's protein database. While U1's and R1's sequences are very similar, len(R1)>len(U1), I am guessing because R1 contains some extra region. What is an efficient way to align these two proteins? That is, I want to make len(U1)==len(R1), and the chunk that U1 is missing should be filled in with some empty symbol, e.g "-". Would I have to use some recursive segmentation algorithm?

uniprot refseq protein sequence alignment • 1.8k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

For global alignment of two sequences, you're looking for the Needleman-Wunsch algorithm.

ADD COMMENT
0
Entering edit mode
9.2 years ago
dago ★ 2.8k

I think you have to perform a global alignment as @Jean-Karim Heriche said. There are many tools that are able to create and end to end alignment, for example here. With this pairwise alignment you should be able to precisly see the reagion that you want to remove in R1.

ADD COMMENT

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6