**
I have analyzed pdbs of many PDB ids and come up with residue serial inconsistencies. The inconsistencies are:
**
- The residue in a particular chain does not always start with index 1.
- There are some gaps in residue sequences for example 50-58. There is a jump from 50 to 58. There are no residue in between.
- There are sometimes residue length dont match with the given FASTA sequence.
I also have questions about PDB ids and UniProtIds.
Why there are multiple pdb ids grouped into a single UniProtID? Why are their protein sequences different?
Thank you for answering. I will see the links you provided. You mostly answered with the right answers. I will be careful in giving examples in my future posts.
For the question#3: You can see 3CBH as an example. FASTA has 104 residues but pdb has 103. It skipped the first residue of FASTA. For the uniprotID and pdbID question, pdbIDs 5N9G,8ITY,8IUE,8IUH are grouped in UniprotID A6H8Y1. What is the difference between Uniprot proteins and rcsb pdb proteins? Where there are multiple pdbIDs grouped into single uniprotID? Why their FASTA do not match.
Which FASTA? From where? And what you mean by pdb has 103? The structure file?
I suspect the discrpancy is from the initial methionine? If you scroll down at https://www.rcsb.org/structure/3CHB , you see the gray box all the way over next to each 'UNMODELED' entry under the initial M(Met).
(Please use links as links to make things specific so others don't need to look up what you already did presumably.).
UniprotID A6H8Y1 is Human Transcription factor TFIIIB component B'' homolog.
For PDB entry 5n9g it is present as chains designated C,H, as you can see there by scrolling down to under the 'Macromolecules' section.
Similarly for 8ITY where it is present as chain W.
Similarly for 8IUE where it is present as chain Y (or possibly W if it is using Author designation).
Similarly for 8IUH where it is present as chain W.
Indeed for the last three, if you simply look under the 'Literature' section of an entry, such as for 8IUH, you'll see that paper contained those three structures they solved as indicated by them being listed under 'Primary Citation of Related Structures'.
That is probably because they got different experimental results for the different structures solved in different ways and combinations. Very common things to find in course of solving a structure experimentally and documenting it in the literature. It is common to solve a structure involving a complex of proteins in various conditions and combinations.
Thank you for taking your time replying. It helped a lot.