I am trying to do text classification for PMC data. I downloaded the xml files from http://www.ncbi.nlm.nih.gov/pmc/tools/ftp/
I have a doubt in how to parse these xml files and extract the information like Pmid,abstract, affiliation etc. from these nxml data. Any help on this is appreciated.
I hope that I put my question in clear sense.