Putting atomic coordinates from PDB file into Pandas dataframe?
1
0
Entering edit mode
5.4 years ago

Greetings all.

I have a list of atomic coordinates from a PDB file saved to variable x. This is a short sample of what I get in the interpreter when I write print(x) in my code

[22.732 33.537 34.278]
[20.362 36.096 32.786]
[20.421 34.188 29.509]
[18.039 31.768 31.227]
[16.639 33.68  34.216]
[14.774 36.97  34.169]
[15.869 37.132 37.823]
[18.284 34.705 39.471]
[16.077 34.65  42.582]
[13.807 32.393 40.54 ]
[16.256 29.54  41.111]
[18.689 30.829 43.723]
[16.129 30.09  46.454]
[14.536 27.024 48.066]
[17.114 24.788 46.348]
[16.391 21.581 48.303]
[13.315 20.955 46.163]
[15.592 20.428 43.156]
[17.535 17.539 44.664]
[16.719 14.029 43.436]
[15.347 12.195 46.47 ]
[16.07   8.681 45.172]
[19.803  9.399 45.021]

What I would like to do is put these values in a dataframe in pandas. To do this, here is the code I have written

import pandas as pd
for chains in structure:
    for chain in chains:
        for residue in chain:                             
            for atom in residue:
                x = atom.get_coord()

sample = pd.DataFrame({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})
print(sample)

When this code runs, it outputs the following

           X      Y       Z
0  19.802999  9.399  45.021

For some reason, it only puts the final item in x into the dataframe. I am not sure how to put ever element in x into the dataframe. Does anyone know how to go about doing this?

pandas python PDB • 2.0k views
ADD COMMENT
3
Entering edit mode
5.4 years ago
Ram 43k

This is how loops work - they perform a single task until a condition is satisfied. This single task in your case is assigning atom.get_coord() to x. Since each pass in the loop only assigns to x and you don't use x until the loop is complete, you only see the last value of x.

Try:

import pandas as pd

arr_x = [];

for chains in structure:
    for chain in chains:
        for residue in chain:                             
            for atom in residue:
                x = atom.get_coord()
                arr_x.append({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})

sample = pd.DataFrame(arr_x)
print(sample)
ADD COMMENT

Login before adding your answer.

Traffic: 1529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6