to extract fasta file from PDB and obtain the content of file only as protein sequence
1
0
Entering edit mode
18 months ago

is there any python code to extract fasta file from PDB for a given protein_id(eg:- 1mkp)

alignment sequence assembly • 2.6k views
1
Entering edit mode
18 months ago
Mensur Dlakic ★ 14k
import sys
from Bio import SeqIO

PDBFile = sys.argv[1]
with open(PDBFile, 'r') as pdb_file:
for record in SeqIO.parse(pdb_file, 'pdb-atom'):
print('>' + record.id)
print(record.seq)


Save as pdb-seq.py. Download PDB coordinates for 1mkp and type:

python pdb-seq.py 1mkp.pdb

>1MKP:A
ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLFENAGEFKYKQIPISDHWSQNLSQFFPEAISFIDEARGKN
CGVLVHSLAGISRSVTVTVAYLMQKLNLSMNDAYDIVKMKKSNISPNFNFMGQLLDFERTL

0
Entering edit mode

code should be for python 3

for record in SeqIO.parse(pdb_file, 'pdb-atom'): ^ SyntaxError: unexpected EOF while parsing Plase resolve is problem also for me

1
Entering edit mode

code should be for python 3

This code works fine with Python 3.6 on my computer. Also, I think you may be under a wrong impression that I should be troubleshooting this even after providing full code for you.

0
Entering edit mode

Good day!

Thanks for the script.

It works, but I have a question. How can I know the sequence FASTA of a specific selection of the PDB file? For example, if I have a chain with 100 residues, but I want to know only the first 10 residues FASTA sequence, how can I do that?

Thank you so much.

Regards, Brandon U.

0
Entering edit mode

You can modify Mensur Dlakic 's code as follows. This will get you the first 10 AA.

import sys
from Bio import SeqIO

PDBFile = sys.argv[1]
with open(PDBFile, 'r') as pdb_file:
for record in SeqIO.parse(pdb_file, 'pdb-atom'):
print('>' + record.id)
print(record.seq[:10])


Check the [:10] addition that is making this possible. You can use an appropriate interval e.g. [4:24] to get other sections of the sequence.