I have little direct experience with text-mining tools. Can anyone suggest a good tool or approach for text-mining drug-gene relationships from clinical trials available at clinicaltrials.gov? They provide xml files for each clinical trial record but unfortunately gene information is not a standard field (but often mentioned in free-form descriptive fields). I would have a list of genes and a list of drugs and want to know when they co-occur in a clinical trials record. However, it would be nice to get more than just simple co-occurrence. Is anyone aware of a tool that could rank co-occurrences in some reasonable way based on term incidence, proximity, natural language processing concepts, etc. Here is an example record to give some context.
What you are trying reminds me of the XplorMed tool. http://www.ogic.ca/projects/xplormed/
It used to have more features, but it might still work for you with your input data. It used to be able to start with a keyword, ID, or PubMed query and look for co-occurrence of terms, with ranking. Currently it asks for abstracts but you might be able to fake it out with the Clinical Trials xml records instead. At least it's worth a try.
It would probably help to read their publications about how they did it even if it doesn't work, and you might be able to get the software and tweak it yourself if the abstract trick doesn't work.