Finding the similarity between two arrays of sequences
Entering edit mode
14 months ago
ririri • 0


I'm doing a small bioinformatics project for my class. I have an idea which requires me to compare two lists of multiple protein sequences of the same length to each other and find out how similar they are, like a percentage.

Say I have two arrays, A and B, each containing 20 aligned protein sequences of the same type and roughly same length. So, same protein but different organisms. Let's assume array A contains protein sequences of mammals and array B contains protein sequences of birds. My goal is to find out the similarity or genetic distance between these two types of species using the given sequences.

Any ideas on how to approach this? One idea I had was aligning the sequences of the first array and second arrays first, then creating an "average" sequence for each array using the most common nucleotide in each position and then comparing the two sequences to each other, calculating a similarity percentage. But I'm not sure that this approach would be accurate, wouldn't it result to a skewed percentage?

Thanks in advance.

sequence alignment • 451 views
Entering edit mode

There are a few different approaches that could be useful for determining a similarity matrix:

  • ClustalW
  • blastall

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6