I am converting data from LycoCyc 3.3 into a BridgeDb file, and the compounds.dat of LycoCyc has names in HTML encoding, such as:
<i>D</i>-glycerate
However, the BridgeDb will have names without HTML, e.g. with UTF-8, though I would be happy to have the italics just removed. Is there a Java library that can convert a HTML string like that given above into a non-HTML string, for example in ASCII or even UTF-8 for some superscripted and subscripted digits?
using xslt ?.
a possibility...
forget it , I thought you only wanted to grab the HTML pages.