Finding the similarity between two arrays of sequences
0
0
Entering edit mode
21 months ago
ririri • 0

Hello,

I'm doing a small bioinformatics project for my class. I have an idea which requires me to compare two lists of multiple protein sequences of the same length to each other and find out how similar they are, like a percentage.

Say I have two arrays, A and B, each containing 20 aligned protein sequences of the same type and roughly same length. So, same protein but different organisms. Let's assume array A contains protein sequences of mammals and array B contains protein sequences of birds. My goal is to find out the similarity or genetic distance between these two types of species using the given sequences.

Any ideas on how to approach this? One idea I had was aligning the sequences of the first array and second arrays first, then creating an "average" sequence for each array using the most common nucleotide in each position and then comparing the two sequences to each other, calculating a similarity percentage. But I'm not sure that this approach would be accurate, wouldn't it result to a skewed percentage?

Thanks in advance.

sequence alignment • 560 views
ADD COMMENT
0
Entering edit mode

There are a few different approaches that could be useful for determining a similarity matrix:

  • ClustalW
  • MAFFT
  • blastall
ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6