Question: How Would You Classify Small Molecules Into Biologically Meaningful Categories?
Whenever you have a set of small molecules (with a database identifier like PubChem, ChEBI, KEGG, etc), you commonly get to a point where you might want to classify them into meaningful biological/non-biological categories (like metabolite, secondary metabolite, pesticide, lipid, steroid, terpenoid, food additive, adhesive, etc) . I currently do this by using the ChEBI ontology (specially using the Role ontology), the Chemical branch of the MeSH terms (with PubChem assignments using Eutils) and with the chemical branches of KEGG BRITE (and using all possible cross ref between KEGG compounds, ChEBI, KEGG, PubChem, etc to use each others classifications). Does anybody has other suggested routes of classifying small molecules into functional categories? Thanks!!

Michael Kuhn5.0k
Your list sounds pretty complete to me. One addition would be the mapping to ATC codes, then you know what drugs you have and a somewhat meaningful hierarchy. DrugBank has also a list of drugs and experimental drugs.

PubChem actually has a very long "Classification" section, e.g. for phenol.

ATC Codes... thanks Michael, haven't heard of that!!

If you're considering the drug angle as Michael suggests, then you may also need to take into account SIDER, a database of drug side effects.

SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations.

Added in edit on 15 Mar 2012: I would add the work of Altman et al to this. They have just published a paper on the effects and interactions of various commonly taken drugs by mining hundreds of thousands of adverse events reported to the US Food and Drug Administration (FDA) each year. You'll be most interested in their comprehensive database of drug effects (Offsides) and a database of drug-drug interaction side effects (Twosides).

Very interesting, never heard of this before Larry. Thanks! Do they have a controlled vocabulary/ontology for the side effects?

I'm not sure. You'll have to take a look. SIDER is something I heard about but have not used in any serious manner. My angle on this would be food items, like poly-unsaturated fatty acids, that act like drugs and vice versa. However, my work has not gone much further than the idea.

Thanks, Larry, for suggesting my database! :) In SIDER, we don't show an ontology, but there is an ontology of side effects in UMLS. Actually, the UMLS might also be a good classification for compounds.

