Question: I would to like to know if there any python module to calculate a Hamming distance from multiple sequences aligment
0
gravatar for schlogl
11 months ago by
schlogl70
Brazil-Florianopolis
schlogl70 wrote:

Any of you guys have some suggestion about to calculate the HD distance and the entropy of a multiple sequence alignment?

Thanks

alignment • 699 views
ADD COMMENTlink written 11 months ago by schlogl70
1

It would appear that there are plenty of them.

ADD REPLYlink written 11 months ago by Mensur Dlakic6.6k

I don't vouch for the contents of this repository, but its description at least matches your task: Hamming Distance Comparison of Amino Acid Sequences of 10 Organisms

ADD REPLYlink written 11 months ago by Mensur Dlakic6.6k
3
gravatar for Joe
11 months ago by
Joe18k
United Kingdom
Joe18k wrote:

BioPython can do all of this, but it’s pretty easy to implement yourself (and is good practise).

See for example: https://github.com/jrjhealey/bioinfo-tools/blob/master/Shannon.py

And Hamming distance is super simple: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L84

ADD COMMENTlink written 11 months ago by Joe18k

Hi Joe, I have a nice function for HD, but my doubt was about how to get all sequences checked for HD and the entropy. Once HD counts for two sequences at time. And in a MSA you have a lot of them compared. Maybe a loop checking each two sequence. I will check it out. thanks

ADD REPLYlink modified 11 months ago • written 11 months ago by schlogl70
1

You can do all pairwise comparisons between sequences and store the numbers.

Check out the itertools module.

ADD REPLYlink written 11 months ago by Joe18k

maybe

itertools.imap(function, *iterables) ?

ADD REPLYlink written 11 months ago by schlogl70

I got this Joe:

def hamming_dist(s1, s2):
    assert len(s1) == len(s2)
    hd = 0
    for b1, b2 in zip(s1, s2):
        if b1 != b2:
            hd += 1
    return hd

def imap(function, *iterables):
    iterables = map(iter, iterables)
    for it in iterables:
        args = tuple(it)
        if function is None:
            yield tuple(args)
        else:
            yield function(*args) 

distances = imap(hamming_dist, *itertools.combinations(ls,2))
for dist in distances:
    print(dist)

Do you think there are some way toimprove it?

PS- Just test in a toy exaple.

ADD REPLYlink written 11 months ago by schlogl70
1

I think you can do this more simply if you want to use BioPython.

I forget the exact syntax now but it would be something like:

from Bio import AlignIO
import itertools
aln = AlignIO.read(...)
for r1, r2 in itertools.combinations(aln, 2):
   print("\n".join([r1.id, str(r1.seq), r2.id, str(r2.seq), str(hamming_distance(str(r1.seq), str(r2.seq)))]))

(Not the prettiest output, but you can tweak).

Your solution looks reasonable too though, so whatever works.

ADD REPLYlink modified 11 months ago • written 11 months ago by Joe18k
1

Hey Joe your solution was nicer because included the id. I will try to do that in mine. 8)

DQ676872
GCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGG---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATATGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTCTAGAAAGATACCTAAAGGATCAACAGCTC
AB253421
GCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGA---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGCTACCTAAGGGATCAACAGCTC
7

Thank you for your support! Paulo

ADD REPLYlink modified 11 months ago • written 11 months ago by schlogl70

I will check yours too Joe. Thanks 8)

ADD REPLYlink written 11 months ago by schlogl70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour