**18k**wrote:

BioPython can do all of this, but it’s pretty easy to implement yourself (and is good practise).

See for example: https://github.com/jrjhealey/bioinfo-tools/blob/master/Shannon.py

And Hamming distance is super simple: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L84

Hi Joe, I have a nice function for HD, but my doubt was about how to get all sequences checked for HD and the entropy. Once HD counts for two sequences at time. And in a MSA you have a lot of them compared. Maybe a loop checking each two sequence. I will check it out. thanks

I got this Joe:

```
def hamming_dist(s1, s2):
assert len(s1) == len(s2)
hd = 0
for b1, b2 in zip(s1, s2):
if b1 != b2:
hd += 1
return hd
def imap(function, *iterables):
iterables = map(iter, iterables)
for it in iterables:
args = tuple(it)
if function is None:
yield tuple(args)
else:
yield function(*args)
distances = imap(hamming_dist, *itertools.combinations(ls,2))
for dist in distances:
print(dist)
```

Do you think there are some way toimprove it?

PS- Just test in a toy exaple.

I think you can do this more simply if you want to use BioPython.

I forget the exact syntax now but it would be something like:

```
from Bio import AlignIO
import itertools
aln = AlignIO.read(...)
for r1, r2 in itertools.combinations(aln, 2):
print("\n".join([r1.id, str(r1.seq), r2.id, str(r2.seq), str(hamming_distance(str(r1.seq), str(r2.seq)))]))
```

(Not the prettiest output, but you can tweak).

Your solution looks reasonable too though, so whatever works.

Hey Joe your solution was nicer because included the id. I will try to do that in mine. 8)

```
DQ676872
GCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGG---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATATGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTCTAGAAAGATACCTAAAGGATCAACAGCTC
AB253421
GCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGA---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGCTACCTAAGGGATCAACAGCTC
7
```

Thank you for your support! Paulo

It would appear that there are

plentyof them.6.6kI don't vouch for the contents of this

repository, but its description at least matches your task:`Hamming Distance Comparison of Amino Acid Sequences of 10 Organisms`

6.6k