Question

Splitting several PDB files into chains and save as separate files

1

Entering edit mode

2.2 years ago

Jonathan Lefebre ▴ 70

I am trying to split a large number of pdb files using Biopython and then save them as separate files called pdbid_chain.pdb . So far I did not succeed. Additionally, I am quite new to python.

Any help is highly appreciated!

Here is my code:

#pdb_list contains a list of 208 pdb structures 
#PDB_RAW_DIR is the directory where structures are stored
io = PDBIO()

#parse structures
for f in pdb_list:
    pdb_parsed = PDBParser().get_structure(pdb_ids, str(PDB_RAW_DIR) + '/' + f)

#save chains
for structure in pdb_parsed:
        pdb_chains = structure.get_chains()
        for chain in pdb_chains:
            io.set_structure(chain)
            io.save(pdb_parsed.get_id() + "_" + chain.get_id() + ".pdb")

Cheers!

biology python PDB structural biopython • 3.4k views

ADD COMMENT • link updated 2.2 years ago by Wayne ★ 2.0k • written 2.2 years ago by Jonathan Lefebre ▴ 70

score 1 · Answer 1 · 2022-03-05

1

Entering edit mode

2.2 years ago

Mensur Dlakic ★ 27k

It is not that difficult to figure out BioPython's I/O manipulations with PDB files - see here.

I will get you started with a basic code, and you should be able to figure out how to work with a list. Let's download the files:

wget -q -o /dev/null ftp://ftp.ebi.ac.uk/pub/databases/msd/pdb_uncompressed/pdb2f2f.ent
mv pdb2f2f.ent 2f2f.pdb

The code is hard-coded to work with the file above, but with small changes it can be made more universal.

from Bio.PDB import PDBParser
from Bio.PDB.PDBIO import PDBIO

parser = PDBParser()
io = PDBIO()

structure = parser.get_structure("2f2f", "2f2f.pdb")
pdb_chains = structure.get_chains()
for chain in pdb_chains:
    io.set_structure(chain)
    io.save(structure.get_id() + "_" + chain.get_id() + ".pdb")

This will crate 6 different files for chains A-F.

ADD COMMENT • link 2.2 years ago by Mensur Dlakic ★ 27k

1

Entering edit mode

Alternatives:

You can view a static version of a notebook here that has a bash/sed version of splitting out the chains that is very short.
At the top of that notebook you can find a link to use and run an R-based version, pdbsplit, in the Bio3D package.
You can run that notebooks actively by going here, clicking the launch binder badge, and then in the session that comes up choose the from the available notebooks, 'Split PDB files into chains using command line'.

ADD REPLY • link 2.2 years ago by Wayne ★ 2.0k

0

Entering edit mode

There is also a stand-alone program called pdbsplitchains in this library.

ADD REPLY • link 2.2 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Thanks a lot Mensur! But this does not really help as this is a solution for a single file and is basically what I already did. Meanwhile, I figure out that it is a fundamental problem with biopython (I am using v. 1.77), as described here: https://github.com/biopython/biopython/pull/3223 The fix that they are proposing is working.

Thanks anyways!

ADD REPLY • link 2.2 years ago by Jonathan Lefebre ▴ 70