I have a distance matrix, produced from jukes-cantor estimation of pairwise distances made from clustal. Given that the array is a list of lists, I'm having trouble identifying the idex and minimum value to start with a UPGMA algorithm. I would like to do this in a more "pythonic" way, and without numpy.
The matrix looks like this:
410488935 410488927 410488931 410488939 410488937 410488923 410488933
410488935 0.0000
410488927 0.0065 0.0000
410488931 0.0098 0.0098 0.0000
410488939 0.0850 0.0850 0.0784 0.0000
410488937 0.0817 0.0817 0.0752 0.0033 0.0000
410488923 0.0817 0.0817 0.0752 0.0033 0.0065 0.0000
410488933 0.1340 0.1340 0.1275 0.1340 0.1307 0.1307 0.0000
I pulled the sequence identifiers from the rows and columns, into two separate lists. I then replaced the "0" diagonal with X's, to aid in finding the minimum value. Here are the new lists:
[['410488935', '410488927', '410488931', '410488939', '410488937', '410488923', '410488933']]
[[['X'], ['0.0065', 'X'], ['0.0098', '0.0098', 'X'], ['0.0850', '0.0850', '0.0784', 'X'], ['0.0817', '0.0817', '0.0752', '0.0033', 'X'], ['0.0817', '0.0817', '0.0752', '0.0033', '0.0065', 'X'], ['0.1340', '0.1340', '0.1275', '0.1340', '0.1307', '0.1307', 'X']]]
This is the small snippet I have so far to find the position of the min val and the val itself:
def identify_min(e):
return min(
(n, i, j)
for i, L2 in enumerate(e)
for j, n in enumerate(L2)
)[1:]
minval = float(e([lowrow][lowcol]))
return minval, lowrow, lowcol
print identify_min(matrix)
However, the output of this function is (0,1), where I believe the output should be: (0.0033 (4, 3))
A Python variation:
And in Perl: