6
$\begingroup$

I took this molfile of benzoic acid:


  ACD/Labs0708041726  

  9  9  0  0  0  0  0  0  0  0  1 V2000
   13.2076   -5.2994    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   14.3539   -4.6250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.2076   -6.6480    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.5003   -5.2994    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   14.3539   -3.2763    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   14.3539   -7.3224    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.5003   -6.6480    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.2076   -2.6020    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   15.5003   -2.6020    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  1  3  1  0  0  0  0
  2  4  1  0  0  0  0
  2  5  1  0  0  0  0
  3  6  2  0  0  0  0
  4  7  2  0  0  0  0
  5  8  1  0  0  0  0
  5  9  2  0  0  0  0
  6  7  1  0  0  0  0
M  END

and used inchi-1 program to produce this InChI identifier:

InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)

When I edit molfile and make all atom coordinates identical, e.g. 13.2076 -5.2994 0.0000, inchi-1 produces the same InChI identifier.

Is this a fluke or a normal behavior? I see that atom coordinates are placed into AuxInfo=... string. Are there cases when correct atom coordinates are required in order to produce a valid InChI=1S... string?

$\endgroup$
3
  • 3
    $\begingroup$ Apparently, it is derived from the connectivity table and not from the coordinates. $\endgroup$ Commented Mar 6, 2021 at 7:42
  • $\begingroup$ It seems so. I found a definitive answer, which I will post. $\endgroup$ Commented Mar 6, 2021 at 7:54
  • $\begingroup$ For the interested (accidentally describing benzoic acid), the anatomy of a MOL file, although there is a relevant entry on Wikipedia, too. $\endgroup$
    – Buttonwood
    Commented Mar 6, 2021 at 8:50

1 Answer 1

4
$\begingroup$

Quoting from "InChI, the IUPAC International Chemical Identifier" article [Heller et al. Journal of Cheminformatics (2015) 7:23]:

Each atom is described by a number of properties: its chemical element name; x,y,z- coordinates (all or any of them may be zero);

When I edit all coordinates to zero, inchi-1 produces a valid InChI identifier:

InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)

and auxiliary info, which reflect no valid atom coordinates available:

AuxInfo=1/1/N:6,3,7,1,4,2,5,8,9/E:(2,3)(4,5)(8,9)/rA:9nCCCCCCCOO/rB:d1;s1;s2;s2;d3;d4s6;s5;d5;/rC:;;;;;;;;;

I did a similar test earlier, but I didn't retain the strict 10 character column width for each coordinate, which produced an error:

Error 34 (no InChI; Cannot interpret atom block line:     0.0000  0.0000    0.0000
$\endgroup$
1
  • 2
    $\begingroup$ For a simple compound like benzoic acid, OpenBabel certainly accepts the SMILES string containing no coordinate information at all as sufficient input to reply to obabel -:'O=C(O)c1ccccc1' -oinchi by InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9), too. Of course SMILES simplify and thus possess limitations in representing a structure uniquely and unambiguously, thus the restraining «for a simple compound» at the beginning. $\endgroup$
    – Buttonwood
    Commented Mar 6, 2021 at 8:40

Not the answer you're looking for? Browse other questions tagged or ask your own question.