Entering edit mode
5 weeks ago
John Ma
▴
310
I am currently reading older (late 1990s-early 2000s) protein crystallography papers (about the interleukins, to be precise), and noticed the residue locations referred in these papers no longer matches the sequences currently in the sequence databases (both NCBI and Ensembl), up to and including the oldest version that are on the databases. While I won't be surprised by the differences in sequences, the lack of older sequences makes it hard to reconcile these results with the results of modern sequencing data, specifically, mutational data from gnomAD. Are there any ways that I can reconcile both?
Am curious as to what has changed here (not a structural biologist). Is it the sequence that has changed or something else? Is it possible that the locations of old crystal data don't contain all AA's perhaps that is the discrepancy.
Your last comment is definitely one possibility. Because John doesn't share any PDB ids we are left to speculate on specifics here as there are many possibilities.
See 'Unusual sequence numbering' at Proteopedia for many other possibilities. Despite being plentiful, that isn't exhaustive either.
I've seen some investigators just make up numbering in older literature because while the protein was expressed from a construct related to the biological molecule, it didn't exactly match due to limitations inherent in the experimental process to get any insight into the structure. The discrepancies and reasons increase dramatically if you bring in nucleic acid molecules and analogs of co-factors and binding partners.
Other resources:
Recently the RCSB/Protein Data Bank has improved the numbers matching the biological protein if you look on the sequence tab, but I doubt it is thorough. It used to be I'd have to refer people to PDBsum as the numbers there matched what was used in the structure and not just what the investigators thought they based it on.
As for reconciling, I need to think about the resources. John seems to be asking about one thing to get to another and so really that is a different question. And I believe that is covered elsewhere here already, too.