SIFTS (uniprot <-> pdb chain) has SP_END < SP_BEG for some mappings
0
1
Entering edit mode
6.3 years ago
mtscales ▴ 20

Heya,

In the SIFTS flat file I downloaded (pdb_chain_uniprot.tsv.gz from https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html) I am confused because for 153 of the (uniprot, pdb_chain) mappings SP_BEG > SP_END.

e.g.

PDB   1bxh
CHAIN   A
SP_PRIMARY   P02866
RES_BEG   1
RES_END   237
PDB_BEG   1
PDB_END   237
SP_BEG   164
SP_END   148

Is this an error or is it simply that the UniProt sequence and PDB sequence have the same sequence of 27 side chains but running in opposite directions w.r.t. the backbone.

Also, it seems a bit odd to me that the PDB seems to be for a protein with 237 AAs, but it only 'covers' 27 AAs on the mapped uniprot sequence.

Any ideas? I am quite confused.

Thank you!

SIFTS mapping uniprot pdb protein • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi did you check the .pdb file of your PDB id? DBREF section of PDB file will tell you to extract mapping of your PDB chain with corresponding Uniport sequence.

DBREF  1BXH A    1   118  UNP    P02866   CONA_CANEN     164    281             
DBREF  1BXH A  119   237  UNP    P02866   CONA_CANEN      30    148
ADD REPLY
0
Entering edit mode

Thanks Pallab, I had not looked in the .pdb file!

From .pdb file:

DBREF  1BXH A    1   118  UNP    P02866   CONA_CANEN     164    281             
DBREF  1BXH A  119   237  UNP    P02866   CONA_CANEN      30    148             
DBREF  1BXH B    1   118  UNP    P02866   CONA_CANEN     164    281             
DBREF  1BXH B  119   237  UNP    P02866   CONA_CANEN      30    148             
DBREF  1BXH C    1   118  UNP    P02866   CONA_CANEN     164    281             
DBREF  1BXH C  119   237  UNP    P02866   CONA_CANEN      30    148             
DBREF  1BXH D    1   118  UNP    P02866   CONA_CANEN     164    281             
DBREF  1BXH D  119   237  UNP    P02866   CONA_CANEN      30    148

From my SIFTS mapping file:

PDB,CHAIN,SP_PRIMARY,RES_BEG,RES_END,PDB_BEG,PDB_END,SP_BEG,SP_END
1bxh,A,P02866,1,237,1,237,164,148
1bxh,B,P02866,1,237,1,237,164,148
1bxh,C,P02866,1,237,1,237,164,148
1bxh,D,P02866,1,237,1,237,164,148

So it would seem that the SIFTS file is _wrong_, and that the mapping should be like:

1bxh,A,P02866,1,118,1,118,164,281
1bxh,A,P02866,119,237,119,237,30,148
... + other chains

Would you say it is safer to trust the .pdb file than the SIFTS flat file? It is understandable that SIFTS made a mistake in this situation as it looks like an edge case, where [x][y] in a PDB structure maps to a uniprot that looks like ---[y]--[x]---

Thank you very much for your reply Pallab. I am a bit busy hence I am only just revisiting this. But thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6