Question

Putting atomic coordinates from PDB file into Pandas dataframe?

0

Entering edit mode

5.4 years ago

westin.kosater ▴ 80

Greetings all.

I have a list of atomic coordinates from a PDB file saved to variable x. This is a short sample of what I get in the interpreter when I write print(x) in my code

[22.732 33.537 34.278]
[20.362 36.096 32.786]
[20.421 34.188 29.509]
[18.039 31.768 31.227]
[16.639 33.68  34.216]
[14.774 36.97  34.169]
[15.869 37.132 37.823]
[18.284 34.705 39.471]
[16.077 34.65  42.582]
[13.807 32.393 40.54 ]
[16.256 29.54  41.111]
[18.689 30.829 43.723]
[16.129 30.09  46.454]
[14.536 27.024 48.066]
[17.114 24.788 46.348]
[16.391 21.581 48.303]
[13.315 20.955 46.163]
[15.592 20.428 43.156]
[17.535 17.539 44.664]
[16.719 14.029 43.436]
[15.347 12.195 46.47 ]
[16.07   8.681 45.172]
[19.803  9.399 45.021]

What I would like to do is put these values in a dataframe in pandas. To do this, here is the code I have written

import pandas as pd
for chains in structure:
    for chain in chains:
        for residue in chain:                             
            for atom in residue:
                x = atom.get_coord()

sample = pd.DataFrame({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})
print(sample)

When this code runs, it outputs the following

           X      Y       Z
0  19.802999  9.399  45.021

For some reason, it only puts the final item in x into the dataframe. I am not sure how to put ever element in x into the dataframe. Does anyone know how to go about doing this?

pandas python PDB • 2.0k views

ADD COMMENT • link updated 5.4 years ago by Ram 43k • written 5.4 years ago by westin.kosater ▴ 80

score 3 · Accepted Answer · 2018-11-28

This is how loops work - they perform a single task until a condition is satisfied. This single task in your case is assigning atom.get_coord() to x. Since each pass in the loop only assigns to x and you don't use x until the loop is complete, you only see the last value of x.

Try:

import pandas as pd

arr_x = [];

for chains in structure:
    for chain in chains:
        for residue in chain:                             
            for atom in residue:
                x = atom.get_coord()
                arr_x.append({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})

sample = pd.DataFrame(arr_x)
print(sample)