Question: Parsing Result File
gravatar for Samimirza
8.3 years ago by
Samimirza60 wrote:


I am trying to match PDB records with results from dimplot.

I have managed to create a dictionary : pdb[chain][residue][residue_no]

Now I have some data in folders for each PDB, I am interested in parsing a particular file whose contents are shown below: for eg

pdb1aut: chains L and C

                                             Hydrogen bonds  Non-bonded
               Interacting residues          M-M   S-S   M-S  contacts
          ------------------------------     ---   ---   ---    -----  
Residues: [HIS  107  L] -> [PHE  208  C]      0     0     0        2

Residues: [PRO  144  L] -> [TRP  207  C]      0     0     0        1

Residues: [GLY  142  L] -> [TRP  207  C]      1     0     0        3

Now I want to match the residue details from dic created above and make: pdb[chain][residue][residue_no][M-M][S-S][M-S][non-bonded contacts]

i also need to count the number of such interactions for eg: TRP 207 C comes in line 2 and 3 both, so I need to add number of M-M interactions etc.

I still havenot written a code and I cant do anything complicated since I am a newbie in python.

pdb python parsing • 1.7k views
ADD COMMENTlink modified 8.3 years ago by Lee Katz2.9k • written 8.3 years ago by Samimirza60

One thing that would be nice is to show us your code so far. Making a dictionary that is nested that far is also a bad idea. It may be a better idea to build a class to hold the information for each residue.

ADD REPLYlink written 8.3 years ago by Gww2.6k
gravatar for Brad Chapman
8.3 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

GWW makes an excellent point about your code design. For learning purposes, it's best to post what code you have written so answers can build off of that. To get started, this implementation parses the file and organizes by top level chain and residues. The counts are collapsed with addition for identical residues, so the output for your example looks like:

% python2.6 dimplot-out.txt
{('C', '207', 'TRP'): {'M-M': 1, 'M-S': 0, 'Non-bonded contacts': 4, 'S-S': 0},
 ('C', '208', 'PHE'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 2, 'S-S': 0},
 ('L', '107', 'HIS'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 2, 'S-S': 0},
 ('L', '142', 'GLY'): {'M-M': 1, 'M-S': 0, 'Non-bonded contacts': 3, 'S-S': 0},
 ('L', '144', 'PRO'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 1, 'S-S': 0}}

And the code:

import sys
import pprint

def _parse_chain_parts(c):
    return c.replace("[", "").replace(" -> ", "").split()

def _parse_stats(s):
    m_m, s_s, m_s, non_bond = s.rstrip().split()
    return {"M-M": int(m_m), "S-S": int(s_s), "M-S": int(m_s),
            "Non-bonded contacts": int(non_bond)}

def parse_dimplot_line(line):
    line = line.replace("Residues: ", "")
    chain1, chain2, stats_str = line.split("]")
    stats = _parse_stats(stats_str)
    for chain_str in [chain1, chain2]:
        residue, r_num, chain = _parse_chain_parts(chain_str)
        yield (chain, r_num, residue, stats)

def parse_dimplot(in_file):
    with open(in_file) as in_handle:
        for line in in_handle:
            if line.startswith("Residues:"):
                for chain in parse_dimplot_line(line):
                    yield chain

def combine_stats(one, two):
    combined = dict()
    for k, val in one.items():
        combined[k] = val + two[k]
    return combined

in_file = sys.argv[1]
organized = dict()
for chain, r_num, residue, stats in parse_dimplot(in_file):
    r_key = (chain, r_num, residue)
    if organized.has_key(r_key):
        organized[r_key] = combine_stats(stats, organized[r_key])
        organized[r_key] = stats
ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Brad Chapman9.4k

Thank you for the code, I did a simpler version of the code and created an intermediate file, which I am trying to annotate.

ADD REPLYlink written 8.2 years ago by Samimirza60
gravatar for Lee Katz
8.3 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

You might decide to just use a standard library like BioPerl. If using Perl, you can create a mapping hash however you want and then proceed with your analysis using the code samples below.

my %map=(atom1=>[0,0,0,2], ...);

  use Bio::Structure::IO;
  use strict;
  my $structio = Bio::Structure::IO->new(-file => $file);
  my $struc = $structio->next_structure;
  for my $chain ($struc->get_chains) {
     my $chainid = $chain->id;
     # one-letter chaincode if present, 'default' otherwise
     for my $res ($struc->get_residues($chain)) {
        my $resid = $res->id;
        # format is 3-lettercode - dash - residue number, e.g. PHE-20
        my $atoms = $struc->get_atoms($res);
        # actually a list of atom objects, used here to get a count
        print join "\t", $chainid,$resid,$atoms,"\n";

$stream = Bio::Structure::IO->new(-file => $filename,
                                  -format => 'PDB');

while (my $structure = $stream->next_structure) {
        # do something with $structure
ADD COMMENTlink written 8.3 years ago by Lee Katz2.9k

I am a newbie, so currently not well versed with complicated programming and biopython.

ADD REPLYlink written 8.2 years ago by Samimirza60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 777 users visited in the last hour