Smiles String Comparison Algorithms
3
5
Entering edit mode
12.8 years ago
Biogeek ▴ 170

What are the similarity algorithms normally used to compare slightly different, but related SMILES strings (e.g. Oc1ccc(cc1)\C=C\C(=O)c2ccc(O)cc2O vs O=C(/C=C/c1ccccc1)c2ccccc2).

chemoinformatics similarity • 8.6k views
ADD COMMENT
3
Entering edit mode
12.8 years ago
brentp 24k

See this by Andrew Dalke.

In it, he references:

Lingos, Finite State Machines, and Fast Similarity Searching", J. A. Grant, J. A. Haigh, B. T. Pickup, A. Nicholls, and R. A. Sayle, J. Chem. Inf. Model 46(5) (2006) p1912-1918.

He also looks at using compression via zlib to look at compression.

ADD COMMENT
3
Entering edit mode
12.8 years ago

Comparing SMILES directly only makes sense when you use canonical SMILES. More common is to process the SMILES in a chemical graph, and compare the actual graphs, so that it does not matter that you can have multiples SMILES for the same molecule. From then on, I suggest the fingerprint as representation for which you can calculate the similarity with the Tanimoto distance.

Example code using the CDK and R can be found in this vignette using the rcdk package.

ADD COMMENT
0
Entering edit mode

To expand...there can be many SMILES strings for the same chemical structure, so it doesn't make sense to compare the strings themselves.

ADD REPLY
1
Entering edit mode
12.8 years ago
Gilleain ▴ 30

You can use the SMSD to compare molecules as SMILES, which gives various similarity measures including Tanimoto.

ADD COMMENT

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6