Different length but same sequence (PDB)
0
0
Entering edit mode
7.1 years ago
Expe ▴ 10

Hi,

I started working with PDB files and Biopython. I can't figure out why there is a different sequence length between data in fasta files and in pdb files. An example is the protein 5dj7. In the fasta file, the length is 230, whereas in the pdb file, I get 593. To find the length using the pdb file I used the following code and I don't know if I am interpreting it right.

pdb_f ="5dj7.pdb" 
structure = parser.get_structure('5dj7', pdb_f)
model=structure[0]

for model in structure:
    for chain in model:
        if chain == model['A']:
            print(len(chain))

Thanks in advance!

pdb fasta sequence length biopython • 1.8k views
ADD COMMENT
0
Entering edit mode

The PDB likely has multiple chains or models depending on how the structure was resolved.

You may also not get exact multiples of an expected length because post translational processing may have removed residues etc. Open the PDB in a Viewer like PyMol or Chimera to examine it.

ADD REPLY
0
Entering edit mode

I think 593-230 = 363 water molecules are included in the chain, in that case.

ADD REPLY

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6