Question

The reason of multiple biounit files for one PDB

1

Entering edit mode

9.5 years ago

ajingnk ▴ 130

I have seen cases that one PDB can have multiple BioUnit files, and I am not quite the sure the reason they make multiple biounits. Some biounits only contain the binding peptide. Could I just use the first biounit file? Or I need to be careful about which one to choose.

Because I want to do it in large scale, rules like "choosing the one with the largest number of residues" would be better.

pdb protein • 2.7k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.5 years ago by ajingnk ▴ 130

Ram · Accepted Answer · 2015-02-14

There's 2 main reasons for having multiple biounits in the PDB:

When there are multiple equivalent copies of the biounit in the crystal, then they are annotated as several biounits. A very typical example would be 4re6: a dimeric protein with 4 chains in the asymmetric unit. In that example biounit 1 is the dimer between chains A and B, biounit2 the dimer between chains C and D. Both are equivalent, but as it appears twice in the crystal it is given as 2 biounits.
Another reason is that the PDB tries to accommodate different opinions about them. The annotations can come from the authors or from software predictions (mainly PISA). So many times you will encounter both of them: biounit1 authors and biounit2 PISA. It can also happen that a certain biounit is annotated by both PISA and authors, when both are agreeing. The same 4re6 example above is also showing this, the first 2 biounits are from both authors and PISA, whilst biounit3 is only PISA (predicting a tetramer). As you see even PISA can have multiple predictions and in some cases like 4re6 they are both added to the PDB.

An important point to make is that experimentally coming out with a correct biounit is not a simple task. From crystallographic data alone you can't really do much, you need to use other experimental methods like gel filtration, analytical ultra centrifugation, light scattering, mutagenesis etc in order to be sure of the oligomeric state in solution of the protein. Those methods can sometimes not be conclusive enough, thus making things complicated.

As to which of the biounits to use there's not a simple straight forward answer to that. One very accepted method is to simply use biounit1 and ignore the rest. That is counting on that the PDB annotators often will set biounit1 to their main preference (due to good authors data or software having confident predictions). Another method would be to use the first biounit annotation coming from authors, trusting that authors are doing their job correctly.

A last word of caution comes with errors in annotations: there are quite a few errors in biounit annotations in the PDB, see for instance our paper where we analyse the problem in some detail.