Question: I would to like to know if there any python module to calculate a Hamming distance from multiple sequences aligment
0
11 months ago by
schlogl70
Brazil-Florianopolis
schlogl70 wrote:

Any of you guys have some suggestion about to calculate the HD distance and the entropy of a multiple sequence alignment?

Thanks

alignment • 699 views
written 11 months ago by schlogl70
1

It would appear that there are plenty of them.

I don't vouch for the contents of this repository, but its description at least matches your task: `Hamming Distance Comparison of Amino Acid Sequences of 10 Organisms`

3
11 months ago by
Joe18k
United Kingdom
Joe18k wrote:

BioPython can do all of this, but it’s pretty easy to implement yourself (and is good practise).

And Hamming distance is super simple: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L84

Hi Joe, I have a nice function for HD, but my doubt was about how to get all sequences checked for HD and the entropy. Once HD counts for two sequences at time. And in a MSA you have a lot of them compared. Maybe a loop checking each two sequence. I will check it out. thanks

1

You can do all pairwise comparisons between sequences and store the numbers.

Check out the `itertools` module.

maybe

itertools.imap(function, *iterables) ?

I got this Joe:

``````def hamming_dist(s1, s2):
assert len(s1) == len(s2)
hd = 0
for b1, b2 in zip(s1, s2):
if b1 != b2:
hd += 1
return hd

def imap(function, *iterables):
iterables = map(iter, iterables)
for it in iterables:
args = tuple(it)
if function is None:
yield tuple(args)
else:
yield function(*args)

distances = imap(hamming_dist, *itertools.combinations(ls,2))
for dist in distances:
print(dist)
``````

Do you think there are some way toimprove it?

PS- Just test in a toy exaple.

1

I think you can do this more simply if you want to use BioPython.

I forget the exact syntax now but it would be something like:

``````from Bio import AlignIO
import itertools
for r1, r2 in itertools.combinations(aln, 2):
print("\n".join([r1.id, str(r1.seq), r2.id, str(r2.seq), str(hamming_distance(str(r1.seq), str(r2.seq)))]))
``````

(Not the prettiest output, but you can tweak).

Your solution looks reasonable too though, so whatever works.

1

Hey Joe your solution was nicer because included the id. I will try to do that in mine. 8)

``````DQ676872
GCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGG---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATATGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTCTAGAAAGATACCTAAAGGATCAACAGCTC
AB253421
GCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGA---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGCTACCTAAGGGATCAACAGCTC
7
``````

Thank you for your support! Paulo