Question

PCA on 2 crystal structures, why are my eigenvalues so high?

0

Entering edit mode

15 months ago

Sam • 0

I'm attempting to observe local conformational changes by using principal component analysis between different crystal structures (similar to what is done for MD simulations, but this is between solved structures instead of simulated ones). I have the setup, but I don't quite understand why my eigenvalues are both so high and so low.

I am using 2 PDBs: 1mmi and 1jql

I remove rotation and translation by aligning (in pymol) 1jql chain A with 1mmi chain A. I'm only looking at the CA atoms for now, but I plan to incorporate multiple atoms later. I then generate a list of the x,y,z co-ordinates and generate a matrix using them. I calculate the covariance matrix, diagonalize, and derive eigenvalues and eigenvectures from this. Then plot the eigenvalues.

Here is the code:

import re
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt

pdb_1='1mmi.pdb'
pdb_2='aligned_1jqlpdb'

coordinate_list_closed=[]
coordinate_list_open=[]

with open(pdb_1) as closed_conformation:
    for lines in closed_conformation:
        if re.search('ATOM\s+\d+\s+CA\s+A*[A-Z]{3}\s+A',lines.strip()) is not None:
            coordinate_list_closed.append(float(lines.strip().split()[6]))
            coordinate_list_closed.append(float(lines.strip().split()[7]))
            coordinate_list_closed.append(float(lines.strip().split()[8]))


with open(pdb_2) as open_conformation:
    for line in open_conformation:
        if re.search('ATOM\s+\d+\s+CA\s+[A-Z]+\s+A',line.strip()) is not None:
            coordinate_list_open.append(float(line.strip().split()[6]))
            coordinate_list_open.append(float(line.strip().split()[7]))
            coordinate_list_open.append(float(line.strip().split()[8]))         


atom_array=np.stack((np.array(coordinate_list_closed),np.array(coordinate_list_open)),axis=1)
covariance_matrix=np.cov(atom_array)
eig_val,eig_vect=LA.eig(covariance_matrix)
plt.scatter(np.arange(len(eig_val)),eig_val)
plt.show()

The very first eigenvalue, regardless of atom, which array I start with in the data, co-ordinates, is always in the triple digit range. Which makes no sense. I am looking for conformational rearrangments in Angstroms...you can't have a change of 800A. So I don't know if I have the wrong setup, or am going about this the wrong way. Any help would greatly be appreciated!

python PCA • 795 views

ADD COMMENT • link 15 months ago by Sam • 0

score 0 · Answer 1 · 2023-01-10

It seems to me that your code computes the eigen decomposition of the merged set of atoms. This is probably not what you want. The eigen decompositon of a point set will give you the directions along which the points are spread, basically the axes of an ellipsoid fitted to the points with the eigenvalues giving the size of each axis so if you want to compare the two structures from that point of view, you need to compute one eigen decomposition for each point set.
There are plenty of methods to deal with point sets from other fields. It just so happens that I have released an R package (LOMAR, also available on CRAN) that implements several point set registration methods and point set similarity measures based on persistent homology. This is very generic and may or may not be useful in your context. You may want to also have a look at more domain-specific approaches (e.g. the paper Comparing crystal structures with symmetry and geometry.)