Nucleotide/protein distance library based on common substitution models
0
0
Entering edit mode
3.6 years ago
Klios3th • 0

Hi,

I need to find a way to evaluate the distance between multiple sequences from fasta/phylip files (either nucleotide or proteins) and build a distance matrix.

My code so far is in python (using biopython for file handling) and i've been surprised that there was no obvious python library that offered such a tool. I've looked at dist.dna of the APE R library but i'd rather avoid that kind of dependancy since the people using my program may not have R installed and its both slower than i'd like and doesnt cover protein sequences.

Right now, im using a terrible workaround of building phyml trees with the substitution models of my liking and extracting the distance between the sequences using dendropy. its inefficient, slow and im not 100% sure of the validity tbh...

I did find some libaries such as Pyvolve that allowed me to get substitution matrices for all the models but i have no idea if/how they can be applied into calculating distances without too much hassle.

Ideally, the substitution models im looking for are : JC69, GTR, F80, k81, HKY, WAG, etc

Does anyone know of any library that would do that? What would be my best way forward?

Thanks and have a nice day

sequence gene genome distance substitution model • 645 views
ADD COMMENT

Login before adding your answer.

Traffic: 3082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6