**110**wrote:

this is a continuation from my previous post, where I wanted a faster and more efficient alternative to a standard Python loop, which performs some summing and multiplication on elements of each row.

Basically, what I have are two file inputs. One is a list of all combinations for a group of SNPs, for example below for 3 SNPs:

```
AA CC TT
AT CC TT
TT CC TT
AA CG TT
AT CG TT
TT CG TT
AA GG TT
AT GG TT
TT GG TT
AA CC TA
AT CC TA
TT CC TA
AA CG TA
AT CG TA
TT CG TA
AA GG TA
AT GG TA
TT GG TA
AA CC AA
AT CC AA
TT CC AA
AA CG AA
AT CG AA
TT CG AA
AA GG AA
AT GG AA
TT GG AA
```

And the second is a table, containing some information for each SNP, notably their log(OR) for a disease and the frequency of the risk allele:

```
SNP1 A T 1.25 0.223143551314 0.97273
SNP2 C G 1.07 0.0676586484738 0.3
SNP3 T A 1.08 0.0769610411361 0.1136
```

Below is my main code, in which I am looking to calculate a 'score' and a 'frequency' for each 'profile. The score is the sum of log(ORs) for each risk allele present in the profile, while the frequency is the frequencies multiplied together, assuming Hardy Weinberg equilibrium:

```
import pandas as pd
numbers = pd.read_csv(table2, sep="\t", header=None)
combinations = pd.read_csv(table1, sep=" ", header=None)
def score_freq(line):
score=0
freq=1
for j in range(len(line)):
if line[j][1] != numbers.values[j][1]: # homozygous for ref
score+=0
freq*=(float(1-float(numbers.values[j][6]))*float(1-float(numbers.values[j][6])))
elif line[j][0] != numbers.values[j][1] and line[j][1] == numbers.values[j][1]: # heterozygous
score+=(float(numbers.values[j][5]))
freq*=(2*(float(1-float(numbers.values[j][6]))*float(numbers.values[j][6])))
elif line[j][0] == numbers.values[j][1]: # homozygous for risk
score+=2*(float(numbers.values[j][5]))
freq*=(float(numbers.values[j][6])*float(numbers.values[j][6]))
if freq < 1e-05: # threshold to stop loop in interest of efficiency
break
return pd.Series([score, freq])
combinations[['score', 'freq']] = combinations.apply(lambda row: score_freq(row), axis=1)
#combinations[['score', 'freq']] = score_freq(combinations.values) # vectorization?
print(combinations)
```

I was referring to this site, where they go over the fastest way to loop over a Pandas dataframe. I have been able to use the Pandas apply method, but I am not sure how to perform the vectorization method over the Pandas series. Other than that, do suggest any way in which I can improve my script to make it more efficient, thanks!

Hello Volka!

We believe that this post does not fit the main topic of this site.

This is a pure Python question. Please ask at StackExchange.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

38kIn fact it is most likely an ill-posed or XY-problem. Until OP accepts that, repeated posts without further details will not help.

47k