Question: Converting between drug identifier formats
0
gravatar for EverInEarnest
2.1 years ago by
EverInEarnest30 wrote:

I have two CSV files of drug-related data. One has the drug info specified with CHEMBL identifiers, whereas the second file contains DrugBank and PubChem IDs. I need to compare these two files for overlap in their drug contents. Both files contain drug names in string format, but working with those is tricky, since often a single row/drug will contain several synonyms, and accurately matching between the two files seems like it will be challenging, especially since both files are unlikely to contain the same synonyms for a particular drug.

I'm looking for a simple way (e.g. an existing function or website) that will allow me to convert between my CHEMBL IDs in the first file, and my DrugBank & PubChem IDs in the second file. I have performed a fairly extensive search, but am surprised that I'm not finding e.g. an R or Python function, or a web-based tool, that would allow me to do this. [This site is similar to what I need, with lots of options for the "From" format, but unfortunately, no useful options for the "To" format: http://cts.fiehnlab.ucdavis.edu/conversion/batch ]. I also located this Jupyter Notebook (http://nbviewer.jupyter.org/url/git.dhimmel.com/drugbank/unichem-map.ipynb) to match DrugBank compounds to external resources using UniChem, but for my purposes, this Notebook seems far too complex for the simple conversion I'm seeking.

Any suggestions about resources that might assist with this drug ID conversion task will be much appreciated. Thanks!!

conversion drug database • 2.4k views
ADD COMMENTlink modified 2.1 years ago by Wolf Ihlenfeldt70 • written 2.1 years ago by EverInEarnest30
2
gravatar for Zhilong Jia
2.1 years ago by
Zhilong Jia1.5k
London
Zhilong Jia1.5k wrote:

Convert the PubChem IDs to CHEMBL IDs (In the Output IDs section, choose Registry IDs - CHCMBL.) via https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi

ADD COMMENTlink written 2.1 years ago by Zhilong Jia1.5k

Many thanks, Zhilong! That is exactly what I needed!

ADD REPLYlink written 2.1 years ago by EverInEarnest30

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 20 months ago by Pierre Lindenbaum124k
1
gravatar for Wolf Ihlenfeldt
2.1 years ago by
Wolf Ihlenfeldt70 wrote:

This is easily done with the Cactvs Cheminformatics Toolkit (visit www.xemistry.com/academic for free academic packages, it includes both a loadable Python module and a stand-alone Python interpreter with chemistry extensions). The toolkit can decode the three IDs you are using (and many more) into structure objects, and the fastest way to compare these is by computing a structure hashcode. There is no name/synonym matching involved - this purely works on structural connectivity

Here some interactive commands in the Python version, comparing Aspirin via its different DB IDs, and also directly computing the database IDs for structures from a different source:

cspy
pycactvs>e1=Ens('CID:2244')
pycactvs>e2=Ens('CHEMBL:25')
pycactvs>e3=Ens('DRUGBANK:DB00945')
pycactvs>e1.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e2.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e3.E_ISOTOPE_STEREO_HASH128
'8e1a0233-a328-045d-e61d-32db15c50d00'
pycactvs>e1.E_CHEMBL_ID
'CHEMBL:25'
pycactvs>e1.E_DRUGBANK_ID
'DB00945'
pycactvs>e2.E_CID
2244
pycactvs>e1.E_SMILES
'CC(=O)OC1=CC=CC=C1C(=O)O'

There is a chemistry-aware table object which helps you with the processing of table data files. I'd be surprised if this required more than 10 lines of script code.

ADD COMMENTlink written 2.1 years ago by Wolf Ihlenfeldt70

Thanks, Wolf! That resource looks very useful; I will check it out.

ADD REPLYlink written 2.1 years ago by EverInEarnest30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 888 users visited in the last hour