Hi,
I need to find a way to evaluate the distance between multiple sequences from fasta/phylip files (either nucleotide or proteins) and build a distance matrix.
My code so far is in python (using biopython for file handling) and i've been surprised that there was no obvious python library that offered such a tool. I've looked at dist.dna of the APE R library but i'd rather avoid that kind of dependancy since the people using my program may not have R installed and its both slower than i'd like and doesnt cover protein sequences.
Right now, im using a terrible workaround of building phyml trees with the substitution models of my liking and extracting the distance between the sequences using dendropy. its inefficient, slow and im not 100% sure of the validity tbh...
I did find some libaries such as Pyvolve that allowed me to get substitution matrices for all the models but i have no idea if/how they can be applied into calculating distances without too much hassle.
Ideally, the substitution models im looking for are : JC69, GTR, F80, k81, HKY, WAG, etc
Does anyone know of any library that would do that? What would be my best way forward?
Thanks and have a nice day