I have a list of Uniprot IDs. Based on these IDs I would like to parse the ordering of amino acid residues in the ATOM field of PDB structures. But ATOM field residue numbers do not always match with the order of residues in corresponding ResSeq numbers.After searching Biostars I found a post about SIFTS database.
But the residue number information in SIFTS database are in xml.gz files. I really don't know how to read these files using either R or Python.
I tried some solutions from Biostars itself .But they don't work in my case.I would like to give Uniport IDs (or PDB IDs) one bye one and parse the xml files to get the residues numbers in PDB and corresponding residue number in Res Seq field.
If appreciate suggestions from both R and Python experts, because I would like to know both approaches.
link to SFITS database :https://www.ebi.ac.uk/pdbe/docs/sifts/quick.html
Following is the xml file repository: ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/split_xml/
Thank you in advance.