Question

matching PDB and PFAM sequences for contact mapping

0

Entering edit mode

2.5 years ago

Evan • 0

I am trying to generate a contact prediction from PFAM MSAs but I need to reliably map a given protein family sequence (a specific sequence from the MSA from PFAM) with its corresponding PDB sequence.

Take as an example PF00011:

The PFAM reference sequence is: ['D' 'W' 'K' 'E' 'T' 'P' 'E' 'A' 'H' 'V' 'F' 'K' 'A' 'D' 'L' 'P' 'G' 'V' 'K' 'K' 'E' 'E' 'V' 'K' 'V' 'E' 'V' 'E' 'D' 'G' 'N' 'v' 'L' 'V' 'V' 'S 'G' 'E' 'R' 'T' 'k' 'e' 'K' 'E' 'D' 'K' 'N' 'D' 'K' 'W' 'H' 'R' 'V' 'E' 'R' 'S' 'S' 'G' 'K' 'F' 'V' 'R' 'R' 'F' 'R' 'L' 'L' 'E' 'D' 'A' 'K' 'V' 'E' 'E' 'V' 'K' 'A' 'G' 'L' 'E' 'N' 'G' 'V' 'L' 'T' 'V' 'T' 'V' 'P' 'K' 'A' 'E' 'V' 'K' 'K' 'P' 'E' 'V' 'K' 'A' 'I' 'Q' 'I' 'S']

... and loading the PDB sequence using the PFAM-provided PDB-id '2BYU' I get the following sequence: ['N', 'A', 'R', 'M', 'D', 'W', 'K', 'E', 'T', 'P', 'E', 'A', 'H', 'V', 'F', 'K', 'A', 'D', 'L', 'P', 'G', 'V', 'K', 'K', 'E', 'E', 'V', 'K', 'V', 'E', 'V', 'E', 'D', 'G', 'N', 'V', 'L', 'V', 'V', 'S', 'G', 'E', 'R', 'T', 'K', 'E', 'K', 'E', 'D', 'K', 'N', 'D', 'K', 'W', 'H', 'R', 'V', 'E', 'R', 'S', 'S', 'G', 'K', 'F', 'V', 'R', 'R', 'F', 'R', 'L', 'L', 'E', 'D', 'A', 'K', 'V', 'E', 'E', 'V', 'K', 'A', 'G', 'L', 'E', 'N', 'G', 'V', 'L', 'T', 'V', 'T', 'V', 'P', 'K', 'A', 'A', 'I', 'Q', 'I', 'S', 'G']

both sequences are nearly identical with the exception of the additional 'N', 'A', 'R', 'M' at the beginning of the pdb sequence. Is their some reference that allows us to extract the exact-matching sequence from the PDB database?

Thanks in advance, Evan

PDB sequence prediciton PFAM contact • 581 views

ADD COMMENT • link updated 2.5 years ago by Mensur Dlakic ★ 27k • written 2.5 years ago by Evan • 0

0

Entering edit mode

I don't know what MSA you plan to use - Pfam has several of them for each family - but they may not be diverse enough for reliable contact prediction.

ADD REPLY • link 2.5 years ago by Mensur Dlakic ★ 27k