Question: Find mapping of indices of amino acid index in PDB files and sequence
0
gravatar for JJP
5 weeks ago by
JJP0
JJP0 wrote:

Hi All,

I am a beginner in Biopython. What I am trying to do is the following:

I have a sequence of amino acids (including gaps)and a corresponding PDB file. The numbering of amino acids in the PDB file does not match the numbering of the amino acids in the sequence list. I want to find the index of each amino acid entries in the PDB file and find the corresponding number in the sequence. For example, if the first entry in the PDB file is Alanine, I want to find the corresponding index of Alaline in the sequence list. Also, for gaps (-), I want to set the index as zero.

Here is the sequence list I have:

-LLPYFDF----DVPRNLTVTVGQT-GFLHCRVERLGDK-----DVSWIRKR----------DLHILTAGGTTYTSDQRFQVLRP---------------------------------------DGSANWTLQIKYPQPRDSGVYECQINTEP-KMSLSYTFNVVE-IVDPKFSSPIVNMTAPVGRDAFLTCVVQDLGPYKVAWLRVDTQTILTIQNHVITKNQRIGIANSEH---KTWTMRIKDIKESDKGWYMCQINTDPMKSQMGYLDVV----

Here is what I have tried so far:

import pylab as pyl
import numpy as np
import sys
import os
import re
import argparse

def parseArgs():
"""Parse command line arguments"""

try:
   parser = argparse.ArgumentParser(
   description = 'Read and extract items from input PDB file')

parser.add_argument('-i',
                    '--input',
                    action='store',
                    required=True,
                    help='input PDB file in standard format')

 except:
 print ("An exception occurred with argument parsing. Check your provided options.")
 traceback.print_exc()

 return parser.parse_args()

 # Reads a PDB file and returns the residue name and coordinates for 
 # each C-alpha atom
 # (the input argument for this routine is the pdb file name.)

def get_coordinates_PDB(File_In):
  try:
      fl = open(File_In,'r')
 except:
  print('Could not open input file {0}'.format(File_In))
  sys.exit()
  Res = []
  Points = []

 #Getting from a PDB file

for line in fl:
  if not(line.startswith('ATOM')):
    continue
elif (line[13:15] != 'CA'):
    continue
resname = line[17:20]
xyz = re.findall('[-+]?\d+\.\d+', line)
tmp = np.zeros(3)
Res.append(resname)
tmp[0] = float(xyz[0])
tmp[1] = float(xyz[1])
tmp[2] = float(xyz[2])
Points.append(tmp)
fl.close()
return Points, Res


def main():
 """Read and parse a provided PDB file."""


#Parse arguments
 args = parseArgs()

 File_In = args.input

print(get_coordinates_PDB(File_In))

if __name__ == '__main__':
    main()

This outputs the x,y,z coordinates and the amino acids in the PDB file. However, I am stalled at this point.

I would much appreciate if someone could help me with implementing the rest. Thank you in advance for your time and help!

sequence python pdb • 245 views
ADD COMMENTlink modified 5 weeks ago by natasha.sernova3.3k • written 5 weeks ago by JJP0

There was a post several weeks ago. It may be useful to you.

Using STDIN with BioPython's PDB methods

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by natasha.sernova3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour