I'm currently working with a data set of PDBs and I'm interested in the sizes of the residues (number of atom per residue). I realized the number of atoms -len(residue.child_list) - differed from residues in different proteins even though being the same residue. For example: Residue 'LEU' having 8 atoms in one protein but having 19 in another!
My guess is an error in the PDB or in the PDBParser(), nevertheless the differences are huge!
For example in the case of the molecule 3OQ2:
r = model['B'][88]
r1 = model['B'][15] # residue at chain B position 15
In [287]: r.resname
Out[287]: 'VAL'
In [288]: r1.resname
Out[288]: 'VAL'
But
In [274]: len(r.child_list)
Out[274]: 16
In [276]: len(r1.child_list)
Out[276]: 7
So even within a single molecule there's difference in the number of atoms. I'd like to know if this is normal biologically, or if there's something wrong. Thank you.
The parser reads what you give it. Have a look at the PDB file (or see below). One residue has hydrogen atoms assigned while the other doesn't. Side-effect of very high-resolution crystal structures. If you would have seen the contents of child_list you would have understood the problem immediately.
ATOM 1843 N VAL B 15 43.162 21.984 17.104 1.00 16.22 N
ATOM 1844 CA VAL B 15 44.075 21.333 16.172 1.00 18.72 C
ATOM 1845 C VAL B 15 45.427 21.082 16.807 1.00 22.32 C
ATOM 1846 O VAL B 15 45.950 21.920 17.538 1.00 28.12 O
ATOM 1847 CB VAL B 15 44.229 22.144 14.857 1.00 19.22 C
ATOM 1848 CG1 VAL B 15 44.649 23.580 15.150 1.00 21.69 C
ATOM 1849 CG2 VAL B 15 45.222 21.474 13.921 1.00 20.65 C
.....
ATOM 2962 N VAL B 88 33.193 42.159 23.916 1.00 11.01 N
ATOM 2963 CA VAL B 88 33.755 43.168 24.800 1.00 12.28 C
ATOM 2964 C VAL B 88 35.255 43.284 24.530 1.00 12.91 C
ATOM 2965 O VAL B 88 35.961 42.283 24.451 1.00 14.78 O
ATOM 2966 CB VAL B 88 33.524 42.841 26.286 1.00 12.81 C
ATOM 2967 CG1 VAL B 88 34.166 43.892 27.160 1.00 16.03 C
ATOM 2968 CG2 VAL B 88 32.020 42.727 26.586 1.00 17.67 C
ATOM 2969 H VAL B 88 33.642 41.425 23.899 1.00 13.21 H
ATOM 2970 HA VAL B 88 33.340 44.035 24.608 1.00 14.73 H
ATOM 2971 HB VAL B 88 33.941 41.979 26.492 1.00 15.37 H
ATOM 2972 HG11 VAL B 88 34.011 43.670 28.081 1.00 19.23 H
ATOM 2973 HG12 VAL B 88 35.110 43.912 26.983 1.00 19.23 H
ATOM 2974 HG13 VAL B 88 33.777 44.746 26.959 1.00 19.23 H
ATOM 2975 HG21 VAL B 88 31.902 42.523 27.516 1.00 21.20 H
ATOM 2976 HG22 VAL B 88 31.596 43.562 26.377 1.00 21.20 H
ATOM 2977 HG23 VAL B 88 31.647 42.026 26.047 1.00 21.20 H
So high-resolution crystal structures are actually more accurate on the number of atoms or these hidrogen atoms are an erratic side effect? Thank you for your answer.
You can consider them 'more accurate' in the sense that they provide a clearer density that allows crystallographers to unambiguously determine the position of even small hydrogen atoms. The problem here is that apparently not all hydrogen atoms could be properly determined and so you have this discrepancy. Regardless, depending on your goal, you might or might not need to bother about them. You can add all hydrogen atoms with a generic force field and energy minimize the structure for example (use GROMACS), or use the servers at Molprobity and WHATIF for a similar purpose.
p.s. If the answer is satisfactory please do mark it as 'correct'.
So high-resolution crystal structures are actually more accurate on the number of atoms or these hidrogen atoms are an erratic side effect? Thank you for your answer.
You can consider them 'more accurate' in the sense that they provide a clearer density that allows crystallographers to unambiguously determine the position of even small hydrogen atoms. The problem here is that apparently not all hydrogen atoms could be properly determined and so you have this discrepancy. Regardless, depending on your goal, you might or might not need to bother about them. You can add all hydrogen atoms with a generic force field and energy minimize the structure for example (use GROMACS), or use the servers at Molprobity and WHATIF for a similar purpose.
p.s. If the answer is satisfactory please do mark it as 'correct'.
Thank you again.