Question: Converting between drug identifier formats
gravatar for EverInEarnest
2.9 years ago by
EverInEarnest30 wrote:

I have two CSV files of drug-related data. One has the drug info specified with CHEMBL identifiers, whereas the second file contains DrugBank and PubChem IDs. I need to compare these two files for overlap in their drug contents. Both files contain drug names in string format, but working with those is tricky, since often a single row/drug will contain several synonyms, and accurately matching between the two files seems like it will be challenging, especially since both files are unlikely to contain the same synonyms for a particular drug.

I'm looking for a simple way (e.g. an existing function or website) that will allow me to convert between my CHEMBL IDs in the first file, and my DrugBank & PubChem IDs in the second file. I have performed a fairly extensive search, but am surprised that I'm not finding e.g. an R or Python function, or a web-based tool, that would allow me to do this. [This site is similar to what I need, with lots of options for the "From" format, but unfortunately, no useful options for the "To" format: ]. I also located this Jupyter Notebook ( to match DrugBank compounds to external resources using UniChem, but for my purposes, this Notebook seems far too complex for the simple conversion I'm seeking.

Any suggestions about resources that might assist with this drug ID conversion task will be much appreciated. Thanks!!

conversion drug database • 3.6k views
ADD COMMENTlink modified 3 months ago by hsiaoyi050460 • written 2.9 years ago by EverInEarnest30
gravatar for Zhilong Jia
2.9 years ago by
Zhilong Jia1.6k
Zhilong Jia1.6k wrote:

Convert the PubChem IDs to CHEMBL IDs (In the Output IDs section, choose Registry IDs - CHCMBL.) via

ADD COMMENTlink written 2.9 years ago by Zhilong Jia1.6k

Many thanks, Zhilong! That is exactly what I needed!

ADD REPLYlink written 2.9 years ago by EverInEarnest30

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.


ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum130k
gravatar for Wolf Ihlenfeldt
2.9 years ago by
Wolf Ihlenfeldt70 wrote:

This is easily done with the Cactvs Cheminformatics Toolkit (visit for free academic packages, it includes both a loadable Python module and a stand-alone Python interpreter with chemistry extensions). The toolkit can decode the three IDs you are using (and many more) into structure objects, and the fastest way to compare these is by computing a structure hashcode. There is no name/synonym matching involved - this purely works on structural connectivity

Here some interactive commands in the Python version, comparing Aspirin via its different DB IDs, and also directly computing the database IDs for structures from a different source:


There is a chemistry-aware table object which helps you with the processing of table data files. I'd be surprised if this required more than 10 lines of script code.

ADD COMMENTlink written 2.9 years ago by Wolf Ihlenfeldt70

Thanks, Wolf! That resource looks very useful; I will check it out.

ADD REPLYlink written 2.9 years ago by EverInEarnest30
gravatar for hsiaoyi0504
3 months ago by
hsiaoyi050460 wrote:

Alternatively, use id mapping provided by unichem More than 50 databases are processed to provide a full source mapping.

ADD COMMENTlink written 3 months ago by hsiaoyi050460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 845 users visited in the last hour