I want to annotate all the enzyme names and their kinetic data from given text. Is there any enzyme dictionary that I can use it directly for annotation? How should I annotate kinetic expressions from the text ?
I want to annotate all the enzyme names and their kinetic data from given text. Is there any enzyme dictionary that I can use it directly for annotation? How should I annotate kinetic expressions from the text ?
For doing Natural Language Processing easily I recommend having a look at Annotator from the NCBO. You can choose the enzyme/chemical entities dictionary CHEBI. You can also use the BRENDA ontology that will annotate the organism and should normally annotate enzyme too (pubmed article)
The kinetic expression will be harder to extract. If there is no ontology for that and you will have to use more advanced technique. In this case you will not really need a dictionary based approach but rather a general purpose text-mining software. You can have a look at the TM package in R (it's fast and relatively powerful). You can also write your own Perl script with regular expression but its like re-inventing the wheel and likely to become a clusterf**k of if-else conditions. There are tons of text-mining solution out there and a lot of literature to read.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you provide a sample of the "given text"?
Ianthellamide A (1), a novel octopamine derivative, was isolated from the Australian marine sponge Ianthella quadrangulata. Compound 1 selectively inhibited the activity of kynurenine 3-hydroxylase with an IC(50) value of 1.5 μM. It also significantly increased the level of endogenous kynurenic acid in rat brain and hence has the potential as a neuroprotective agent in the treatment of neurodegenerative disorders.
From the above text I want annotate enzyme names and their kinetic data
excellent review article in Plos comp bio on text-mining for translational bioinfomratics. http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003044