5
$\begingroup$

The documentation for the indigo module can be found here

http://ggasoftware.com/opensource/indigo/api#inputoutput

So for instance if I have a molecule object for some SMILES string, e.g. "[C](=[O])", I wish to calculate the valency of each atom, for instance here the desired output would be [atom=C, unbound_electrons=2],[atom=O, valency=0]

If I consider the atom "[C]" Can anyone explain why is this code printing [atom=C, unbound_electrons=0] not [atom=C, unbound_electrons=4]

from indigo import *
indigo = Indigo()

mol=indigo.loadMolecule("[C]")

print(mol.grossFormula(),"\n")

for atom in mol.iterateAtoms():
        print([atom.symbol(),atom.radicalElectrons()])

EDIT: I could work it out if I could generate a list of the types of bonds on the atom in conjuction with atom.atomicNumber(). E.g. if I could say [C] has a double bond I could take it's atomic number - 2 (second shell) - 2 (double bond)

EDIT#2: This might be useful for visualising what i'm talking about

from indigo_renderer import *
renderer = IndigoRenderer(indigo)
renderer.renderToFile(mol,"mol.png")

EDIT#3: I am not a chemist, so might have got some concepts wrong

$\endgroup$

2 Answers 2

5
$\begingroup$

I was looking into this, and the results I was getting seemed very anomalous. Consider this iteration from a single carbon atom at -4 through +4, this is the output I receive (python code at the bottom of the post):

[C+4]
C valence= 4 radicalE= 0 charge= 4 implicitH= 0
[C+3] |^1:0|
C valence= 4 radicalE= 1 charge= 3 implicitH= 0
[C+2] |^3:0|
C valence= 4 radicalE= 2 charge= 2 implicitH= 0
[C+]
C valence= 0 radicalE= 0 charge= 1 implicitH= 0
[C]
C valence= 0 radicalE= 0 charge= 0 implicitH= 0
[C-]
C valence= 0 radicalE= 0 charge= -1 implicitH= 0
[C-2] |^3:0|
C valence= 4 radicalE= 2 charge= -2 implicitH= 0
[C-3] |^1:0|
C valence= 4 radicalE= 1 charge= -3 implicitH= 0
[C-4]
C valence= 4 radicalE= 0 charge= -4 implicitH= 0

I printed them using the canonicalSmiles format. After I realized something strange was going on (neutral, and +/- 1 charges were 0, but everything else was fine), I dug into SMILES documentation to find out that currently it appears only up to divalent radicals are supported. Something like a lone carbon atom has technically a tetra-valent radical, so it defaults to nothing, because it cannot be interpreted by SMILES.

I pulled this from ChemAxon which explains what the notation after the atom means. I'm going to assume that |^1:0| is the notation for a mono-valent system. The 0 indicates the index of the atom, which is always zero for this example.

Radical numbers
Atom indexes with:
- divalent radical center are written after "^2:"
- divalent singlet radical center are written after "^3:"
- divalent triplet radical center are written after "^4:"
- trivalent radical center are written after "^5:"

And from the Indigo website itself it states under the supported extensions

Radical numbers: monovalent, divalent singlet, and divalent triplet

So, I don't think you've done anything wrong, you've just simply hit a limitation of the SMILES system and the interpreter of the indigo software. Here is the code I used below:

from indigo import *
indigo = Indigo()

from indigo_renderer import *

renderer = IndigoRenderer(indigo)
mols = []
mols.append(indigo.loadMolecule("[C+4]"))
mols.append(indigo.loadMolecule("[C+3]"))
mols.append(indigo.loadMolecule("[C+2]"))
mols.append(indigo.loadMolecule("[C+1]"))
mols.append(indigo.loadMolecule("[C]"))
mols.append(indigo.loadMolecule("[C-]"))
mols.append(indigo.loadMolecule("[C-2]"))
mols.append(indigo.loadMolecule("[C-3]"))
mols.append(indigo.loadMolecule("[C-4]"))

i = 0
for mol in mols:
    print mol.canonicalSmiles()
    for atom in mol.iterateAtoms():
      print atom.symbol(),"valence=",atom.valence(),"radicalE=",atom.radicalElectrons(),"charge=",atom.charge(),"implicitH=",atom.countImplicitHydrogens()
    filename = 'mol'+str(i)+'.png'
    renderer.renderToFile(mol,filename)
    i += 1
$\endgroup$
1
  • $\begingroup$ Thanks so much for the level of detail here! This is extremely helpful. This confirms my worries, I've developed a work around that examines the bond network and counts bonds per atom. It's a bit messy and roundabout, but it seems to work! $\endgroup$
    – Freeman
    Commented Jul 22, 2013 at 16:11
4
$\begingroup$

I have never used the indigo framework, but there's more than one way to process SMILES strings in Python. Open Babel and Pybel might be an alternative:

import pybel

# SMILES for flurazepam, taken from  
# http://chemspider.com/chemical-structure.3276
flurazepam = 'CCN(CC)CCN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3F'

# SMILES for trimethylamine, taken from 
# http://chemspider.com/chemical-structure.1114
trimethylamine = 'CN(C)C'

# SMILES for nitromethane, taken from 
# http://chemspider.com/chemical-structure.6135
nitromethane = 'C[N+](=O)[O-]'

mol = pybel.readstring('smi', flurazepam)

for atom in mol.atoms:
    print '{:<5} {:3} {:3} {:3}'.format(atom.type, atom.formalcharge, \
        atom.implicitvalence, atom.valence)
$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.