I'm storing a set of residues (from Biopython) in a Python set. What's odd is that when I read the same pdb file into two different structure objects, the corresponding residues are not equal. I checked the output from the __hash__
function for the residue class and that is also producing different output, which causes problems for me when trying to store them in sets or dictionaries.
from Bio.PDB import *
parser = PDBParser(QUIET = True)
structA = parser.get_structure('1AOPA', 'mypath/1AOPA.pdb')
modelA = structA[0]
resListA = Selection.unfold_entities(modelA, 'R')
resA1 = resListA[0]
structB = parser.get_structure('1AOPA', 'mypath/1AOPA.pdb')
modelB = structB[0]
resListB = Selection.unfold_entities(modelB, 'R')
resB1 = resListB[0]
print resA1
print resB1 # looks exactly the same as resA1
print resA1 == resB1 # false
I'd like to store certain residues in a set and then in various functions, read the pdb files (from a list passed to the function) into structures one by one and search for the residues in the set each time a structure is created. I'd like to avoid creating lists of structures because there will be thousands of structures. Does anyone know of a work-around for storing residues in sets or dictionaries?
Edit:
So I found this related post on Stack Overflow: http://stackoverflow.com/questions/10802123/implementing-equivalence-in-biopythons-pdb-module
I'd still like to hear if anyone has different thoughts or suggestions on this.