Question: Mutual Information from Nucleotide Distribution in Python
3
gravatar for nameuser
12 weeks ago by
nameuser30
nameuser30 wrote:

This question has been removed from this site -- please see stackoverflow if interested.


Previous content restored by Ram from Google Cache


Hi there,

I'm currently trying to write a program that will calculate the mutation rate given text files of nucleotide distributions. I am hoping to automate the process of calculating mutual information in Excel to python. I'm stuck at this step in the calculation.....

An example of an input file is as follows

A,T,G,C
84 , 59 , 35 , 125032 
74 , 40 , 6 , 125082 
125107 , 44 , 24 , 36 
3 , 44 , 4 , 125161 
125122 , 23 , 28 , 37 
5 , 23 , 4 , 125180 
125149 , 8 , 18 , 37 
125124 , 32 , 14 , 38 
9 , 25 , 8 , 125170

The program:

import pandas as pd
import sys

filename = sys.argv[1]
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)
col = ['A', 'T', 'G', 'C']
df = pd.read_csv(filename, skipinitialspace=True, usecols=col)
df.head(287)
df['max'] = df[['A', 'T', 'G', 'C']].max(axis=1)
df['sum'] = df[['A', 'T', 'G', 'C']].sum(axis=1)
df.loc[:,"A":"C"] = df.loc[:,"A":"C"].div(df["sum"], axis=0)
df['mutation_rate'] = (1-df['max']/df['sum'])
df['max2'] = df[['A', 'T', 'G', 'C']].max(axis=1)
df['sum2'] = df[['A', 'T',  'G', 'C']].sum(ax

is=1)
df['marginal_distribution']=(1-df['max2']/df['sum2'])
df.head()

df.head()
numberOfBins = sys.argv[2]
df['A/numberOfBins'] = df['A'].div(8)
df['T/numberOfBins'] = df['T'].div(8)
df['G/numberOfBins'] = df['G'].div(8)
df['C/numberOfBins'] = df['C'].div(8)
df.head()

With the output

    A   T   G   C
0   0.000671    0.000471    0.00028 0.998578
1   0.000591    0.000319    0.000048    0.999042
2   0.999169    0.000351    0.000192    0.000288
3   0.000024    0.000351    0.000032    0.999593
4   0.999297    0.000184    0.000224    0.000296
5   0.00004     0.000184    0.000032    0.999744
6   0.999497    0.000064    0.000144    0.000295
7   0.999329    0.000256    0.000112    0.000303
8   0.000072    0.0002      0.000064    0.999665



 max    sum mutation_rate
125032  125210  0.001422
125082  125202  0.000958
125107  125211  0.000831
125161  125212  0.000407
125122  125210  0.000703
125180  125212  0.000256
125149  125212  0.000503
125124  125208  0.000671
125170  125212  0.000335

max2    sum2
0.998578    1
0.999042    1
0.999169    1
0.999593    1
0.999297    1
0.999744    1
0.999497    1
0.999329    1
0.999665    1

marginal_distribution
0.001422
0.000958
0.000831
0.000407
0.000703
0.000256
0.000503
0.000671
0.000335

A/numberOfBins  T/numberOfBins  G/numberOfBins  C/numberOfBins
0.000084    0.000059    0.000035    0.124822
0.000074    0.00004     0.000006    0.12488
0.124896    0.000044    0.000024    0.000036
0.000003    0.000044    0.000004    0.124949
0.124912    0.000023    0.000028    0.000037
0.000005    0.000023    0.000004    0.124968
0.124937    0.000008    0.000018    0.000037
0.124916    0.000032    0.000014    0.000038
0.000009    0.000025    0.000008    0.124958

I am attempting to solve for Shannon entropy/Mutual information. Thank you SO much.


entropy • 220 views
ADD COMMENTlink modified 8 weeks ago by _r_am32k • written 12 weeks ago by nameuser30
1

In your loop:

row = list(map(int, row)) 
print(1 - max(row) / sum(row))

Edit: Note: the text (esp. the code) of the question appears to have changed since the initial posting, so this comment doesn't seem to make sense any more.

ADD REPLYlink modified 8 weeks ago • written 12 weeks ago by cschu1812.5k

Hello nameuser,

Do not redact content after you've received feedback on a post. This is inconsiderate and such behavior can lead to suspension of your user account.

Please point to the StackOverflow post that you are referring to. In the meantime, I'll be restoring the content of this post from Google Cache.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by _r_am32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1885 users visited in the last hour
_