Question: I would to like to know if there any python module to calculate a Hamming distance from multiple sequences aligment
0
gravatar for schlogl
4 weeks ago by
schlogl20
schlogl20 wrote:

Any of you guys have some suggestion about to calculate the HD distance and the entropy of a multiple sequence alignment?

Thanks

alignment • 160 views
ADD COMMENTlink written 4 weeks ago by schlogl20
1

It would appear that there are plenty of them.

ADD REPLYlink written 4 weeks ago by Mensur Dlakic2.1k

I don't vouch for the contents of this repository, but its description at least matches your task: Hamming Distance Comparison of Amino Acid Sequences of 10 Organisms

ADD REPLYlink written 4 weeks ago by Mensur Dlakic2.1k
3
gravatar for Joe
4 weeks ago by
Joe14k
United Kingdom
Joe14k wrote:

BioPython can do all of this, but it’s pretty easy to implement yourself (and is good practise).

See for example: https://github.com/jrjhealey/bioinfo-tools/blob/master/Shannon.py

And Hamming distance is super simple: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L84

ADD COMMENTlink written 4 weeks ago by Joe14k

Hi Joe, I have a nice function for HD, but my doubt was about how to get all sequences checked for HD and the entropy. Once HD counts for two sequences at time. And in a MSA you have a lot of them compared. Maybe a loop checking each two sequence. I will check it out. thanks

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by schlogl20
1

You can do all pairwise comparisons between sequences and store the numbers.

Check out the itertools module.

ADD REPLYlink written 4 weeks ago by Joe14k

maybe

itertools.imap(function, *iterables) ?

ADD REPLYlink written 4 weeks ago by schlogl20

I got this Joe:

def hamming_dist(s1, s2):
    assert len(s1) == len(s2)
    hd = 0
    for b1, b2 in zip(s1, s2):
        if b1 != b2:
            hd += 1
    return hd

def imap(function, *iterables):
    iterables = map(iter, iterables)
    for it in iterables:
        args = tuple(it)
        if function is None:
            yield tuple(args)
        else:
            yield function(*args) 

distances = imap(hamming_dist, *itertools.combinations(ls,2))
for dist in distances:
    print(dist)

Do you think there are some way toimprove it?

PS- Just test in a toy exaple.

ADD REPLYlink written 4 weeks ago by schlogl20
1

I think you can do this more simply if you want to use BioPython.

I forget the exact syntax now but it would be something like:

from Bio import AlignIO
import itertools
aln = AlignIO.read(...)
for r1, r2 in itertools.combinations(aln, 2):
   print("\n".join([r1.id, str(r1.seq), r2.id, str(r2.seq), str(hamming_distance(str(r1.seq), str(r2.seq)))]))

(Not the prettiest output, but you can tweak).

Your solution looks reasonable too though, so whatever works.

ADD REPLYlink modified 29 days ago • written 29 days ago by Joe14k
1

Hey Joe your solution was nicer because included the id. I will try to do that in mine. 8)

DQ676872
GCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGG---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATATGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTCTAGAAAGATACCTAAAGGATCAACAGCTC
AB253421
GCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGA---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGCTACCTAAGGGATCAACAGCTC
7

Thank you for your support! Paulo

ADD REPLYlink modified 29 days ago • written 29 days ago by schlogl20

I will check yours too Joe. Thanks 8)

ADD REPLYlink written 29 days ago by schlogl20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 962 users visited in the last hour