Finding Min Value In List Of Lists -- Python -- Without Numpy
1
0
Entering edit mode
10.6 years ago
st.ph.n ★ 2.7k

I have a distance matrix, produced from jukes-cantor estimation of pairwise distances made from clustal. Given that the array is a list of lists, I'm having trouble identifying the idex and minimum value to start with a UPGMA algorithm. I would like to do this in a more "pythonic" way, and without numpy.

The matrix looks like this:

            410488935     410488927     410488931     410488939     410488937     410488923     410488933     
410488935     0.0000
410488927     0.0065 0.0000
410488931     0.0098 0.0098 0.0000
410488939     0.0850 0.0850 0.0784 0.0000
410488937     0.0817 0.0817 0.0752 0.0033 0.0000
410488923     0.0817 0.0817 0.0752 0.0033 0.0065 0.0000
410488933     0.1340 0.1340 0.1275 0.1340 0.1307 0.1307 0.0000

I pulled the sequence identifiers from the rows and columns, into two separate lists. I then replaced the "0" diagonal with X's, to aid in finding the minimum value. Here are the new lists:

[['410488935', '410488927', '410488931', '410488939', '410488937', '410488923', '410488933']] 

[[['X'], ['0.0065', 'X'], ['0.0098', '0.0098', 'X'], ['0.0850', '0.0850', '0.0784', 'X'], ['0.0817', '0.0817', '0.0752', '0.0033', 'X'], ['0.0817', '0.0817', '0.0752', '0.0033', '0.0065', 'X'], ['0.1340', '0.1340', '0.1275', '0.1340', '0.1307', '0.1307', 'X']]]

This is the small snippet I have so far to find the position of the min val and the val itself:

def identify_min(e):
    return min(
    (n, i, j)
    for i, L2 in enumerate(e)
    for j, n in enumerate(L2)
    )[1:]
    minval = float(e([lowrow][lowcol]))
    return minval, lowrow, lowcol
print identify_min(matrix)

However, the output of this function is (0,1), where I believe the output should be: (0.0033 (4, 3))

python • 14k views
ADD COMMENT
3
Entering edit mode
10.6 years ago

Note that you have two cells with the value of 0.0033. Which one do you pick?

Instead, therefore, perhaps consider making a Python dictionary, where keys are cell values, and values are a list of sequence identifier pairs. You append sequence identifiers to this list as you encounter non-zero cell values. At the end, print out the minimum (non-zero) key and the value associated with that key.

Consider the following tab-delimited input matrix file:

Here is a script that takes this input and outputs the minimum key and a list of sequence id pairs associated with that minimum value:

You'd run it something like this:

$ minSeqIdLister.py < seqIdTest.mtx
0.0033 [['410488939', '410488937'], ['410488939', '410488923']]

As you can see, you can find two sequence id pairs for the value 0.0033. You can keep them all, or pick the first, or pick one at random - what you do next is up to you.

Since your adjacency matrix is presumably symmetric, it doesn't matter in which order you store the ids in a pair. If you want to, you could store the row and column indices instead of the sequence identifiers, by changing what is appended to the dictionary vd.

ADD COMMENT
0
Entering edit mode

A Python variation:

#!/usr/bin/env python

import sys

ids = list()
vd = dict()

with open(sys.argv[1]) as fn:
    next(fn)
    for line in fn:
        vals = line.split()
        ids.append(vals.pop(0))
        for colIdx in xrange(len(vals) - 1):
            if vals[colIdx] not in vd.keys():
                vd[vals[colIdx]] = list()
            vd[vals[colIdx]].append([ids[colIdx], ids[-1]])

print min(vd), vd[min(vd)]

And in Perl:

use strict;
use warnings;
use List::Util qw/min/;
use Data::Dump;

my ( %vd, @ids );
<>;

while (<>) {
    my @vals = split;
    push @ids, shift @vals;
    push @{ $vd{ $vals[$_] } }, [ $ids[$_], $ids[-1] ] for 0 .. $#vals - 1;
}

print min ( keys %vd ), dd $vd{ min keys %vd };
ADD REPLY

Login before adding your answer.

Traffic: 957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6