I have DNA motifs represented by position-weight-matrices (PWMs) a.k.a position-specific scoring matrices (PSSMs), in transfac form (motif names are shown in rows following "DE", numbered rows represent the observed frequencies of letters along the sequence such that row 0 is the first letter along the sequence, the last column shows the most representative letter for that position along the sequence and these can be given ambiguity codes (not A,G,C,T) if no particular letter is representative:
DE SRF
0 0.0435 0.0217 0.8478 0.0870 G
1 0.1957 0.7174 0.0435 0.0435 C
2 0.0000 0.9782 0.0217 0.0000 C
3 0.0217 0.9782 0.0000 0.0000 C
4 0.6956 0.0217 0.0000 0.2826 A
5 0.0652 0.0217 0.0000 0.9130 T
6 1.0000 0.0000 0.0000 0.0000 A
7 0.0217 0.0000 0.0000 0.9782 T
8 0.9348 0.0000 0.0000 0.0652 A
9 0.3261 0.0217 0.0000 0.6522 T
10 0.0435 0.0000 0.9565 0.0000 G
11 0.0435 0.0217 0.9348 0.0000 G
XX
DE HMG-1
0 0.0000 0.3846 0.6154 0.0000 G
1 0.0000 0.0000 0.2308 0.7692 T
2 0.0000 0.3077 0.0000 0.6923 T
3 0.0000 0.1539 0.7692 0.0769 G
4 0.0000 0.0769 0.0000 0.9230 T
5 0.4615 0.0769 0.2308 0.2308 N
6 0.2308 0.3846 0.0000 0.3846 N
7 0.0000 0.0769 0.1539 0.7692 T
8 0.0000 0.6154 0.0769 0.3077 C
XX
I would like to calculate the Shannon Entropy for each motif, could anybody advise me on how to do this? I am most comfortable with Python, so if you have a few lines of code available or can point me to some package that would let me do this, or even provide me with a formula that would be much appreciated.