6
$\begingroup$

The only program I have found is Plexus Suite from chemaxon. (I asked for trial, but no answer to me yet).

The task is: create variation of compounds, where is scaffold and different substituents:

enter image description here

And I need to create dataset/database of all possible variations.

How to deal with this problem? Or I need to use some scripting to make the database?

UPD1: best option - creation of SMILES (Simplified molecular-input line-entry system), 3d structure is automated and easy step if there is correct SMILES/connectivity of molecule (which is the same, as smiles) Using XYZ - it means, you should make substituent database in correct position for adding. I am asking for software with functionality

https://www.youtube.com/watch?v=s4CgtUgGuzk

Vega ZZ has the module for database enumeration, but it is working as Python script.

$\endgroup$
7
  • $\begingroup$ The Plexus Suite looks absolutely awesome! Is it available under an academic or academic commercial license? If not, it still is the most elegant, but probably expensive solution. In addition, it might take some time and money for a dedicated machine to to set it up. Is it worth the effort for some 48 derivatives? A workflow, consisting of a 3D molecule editor, that is able to expand superatoms (Ph, Ac) to real atoms, followed by a tool that generates InChI and InChI Keys (to be used as keys in a DB) looks like the next best approach to generate the structures. $\endgroup$ Commented Feb 15, 2016 at 17:11
  • 1
    $\begingroup$ @KlausWarzecha Just got the trial - it is really the only one. I can't find anything, similar to this. About how it works better ask support team - [email protected]. I believe, that academic licence will be not very high, otherwise I will create my own script, without any allegiance, but working. $\endgroup$
    – XuMuK
    Commented Feb 15, 2016 at 17:15
  • $\begingroup$ If such an editor isn't available, generating the skeleton with Avogadro, marking the substituted positions by repacing $\ce{H}$ with different halogen atoms and replacing all these manually with the real substituents is the cheapest option sans the labour costs. From the files thus generated, 'openbabel' can generate the InChIs. The rest is setting up the respective tables in a RDBMS of your choice. $\endgroup$ Commented Feb 15, 2016 at 17:15
  • $\begingroup$ I have used MarvinSketch in the past under an academic license at no cost, thanks to ChemAxon, but the Plexus Suite is definitely another league :) $\endgroup$ Commented Feb 15, 2016 at 17:17
  • $\begingroup$ @KlausWarzecha also, this is server/web-based application, so they are hiding the source code of the chemistry magic. :) $\endgroup$
    – XuMuK
    Commented Feb 15, 2016 at 17:42

2 Answers 2

10
$\begingroup$

This is typically called library (or scaffold) enumeration. Doing it in SMILES is usually pretty easy by script, but there are a few other options:

But it's very easy to write a script like this, e.g. (in python)


r1 = [ "", "C", "N" ]
r2 = [ "H", "C", "c3ccccc3", "C(=O)O" ]
r3 = [ "", "C", "C#N", "O" ]
scaffold = "Xc(cc1c2)ccc1c(Y)cc2-C(C=C1)C=C1Z"

for x in r1: for y in r2: for z in r3: print scaffold.replace('X', x).replace('Y', y).replace('Z', z)

You can then process the resulting SMILES strings with your program of choice (e.g., Open Babel).

$\endgroup$
5
  • 1
    $\begingroup$ the python smiles is much more interesting - will play with substituents tomorrow. Sorry, I don't understand how to load scaffold molecule to the KNIME, haven't worked with it. $\endgroup$
    – XuMuK
    Commented Feb 16, 2016 at 1:15
  • $\begingroup$ @XuMuK - Indeed, it's very easy to do this kind of thing with SMILES. $\endgroup$ Commented Feb 16, 2016 at 3:45
  • $\begingroup$ I have tested there are some problems with correct input for scaffold and substituents, but it is working!!! $\endgroup$
    – XuMuK
    Commented Feb 16, 2016 at 14:22
  • $\begingroup$ Very nice and instructive. Regarding the openbabel smiles->xyz, it tend to produce molecular knots for even small macrocyclic structures. $\endgroup$
    – ssavec
    Commented Mar 14, 2016 at 14:34
  • $\begingroup$ @ssavec If you have some examples, please e-mail me. We are working substantially on coordinate generation algorithms. $\endgroup$ Commented Mar 14, 2016 at 15:52
1
$\begingroup$

There are no doubt many software tools out there to do this kind of scaffold enumeration, however the problem is surely simple enough to do by scripting - or even by hand!

eg:

  • Compound 1 = {R1=H, R2=H, R3=H}
  • Compound 2 = {R1=H, R2=H, R3=Me}
  • Compound 3 = {R1=H, R2=H, R3=CN}
  • ...
  • Compound N = {R1=NH2, R2=Ac, R3=OH}

So just the cartesian-product : R1 x R2 x R3 except with unordered output ... not sure what this is called. There is this math stackexchange question about this.

Oh, and this stackoverflow python code to do this.

$\endgroup$
1
  • $\begingroup$ The xyz not needed, I need 2d structures or SMILES, or 3d with correct connectivity records. Now I am thinking about creation my own script to deal with the problem. "are no doubt many software tools out there to do this kind of scaffold enumeration" - I am not sure, could you give the list? $\endgroup$
    – XuMuK
    Commented Feb 15, 2016 at 15:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.