I had a quick look but they do not seem to be free
We have a database of 5000 mol2 ligand structures from ZINC database. We wonder what could be the best approach for performing this;
a) process database and get ligand clusters in terms of similarity, and also after flexible alignment
b) after a), extract the mol2 files of the most common fragments
Thanks
There are a number of free tools to perform the clustering and the fragment based analysis for small molecules. You could use the CDK to evaluate numerical descriptors or binary fingerprints and use that data to perform clustering (R, WEKA, Matlab, Python, ....). Other options include RDKit or OpenBabel
For fragmentation, you could also use the CDK, RDKit or a tool from NCGC to generate molecular fragments. R is particularly handy for fragment based analysis, and works directly with the CDK (see slides 158-169, though the API has been updated to be easier to use)
And given you're working with just 5000 molecules, the whole thing could be quite straightforward within R (assuming a decent amount of RAM), but of course other environments such as PipelinePilot and KNIME are also pretty straihgtforward
Both points that you're describing are very easy to do with Canvas (by Schrodinger). Not sure if you have access to it, perhaps you can get an evaluation copy. Here's a link to the Canvas product page.