Can anyone recommend some methods for parsing data from PubChem Compound records. I can get a complete dump of the database from the PubChem FTP.
The data is available in ASN, SDF, and XML formats.
For demonstration purposes, imagine that I want to reproduce a subset of the information displayed for a particular drug on the website. For example the record here: Sunitinib.
More specifically, imagine that for this CID (5329102), I want to determine the drug name, the names listed under 'also known as', and the 'Depositor-Supplied Synonyms'.
I ultimately want to be able to perform these kind of queries for every record in PubChem, not just that one.
It sounds like the PubChem Power User Gateway (PUG) might be helpful? If so, can someone provide a description of how I would get going on the example problem I outlined?