6
$\begingroup$

Applications like Vesta show bonds when viewing a CIF file; how could one algorithmically find all bond pairs?

Here is an example CIF file:

# generated using pymatgen
data_LiF
_symmetry_space_group_name_H-M   Fm-3m
_cell_length_a   4.08342738
_cell_length_b   4.08342738
_cell_length_c   4.08342738
_cell_angle_alpha   90.00000000
_cell_angle_beta   90.00000000
_cell_angle_gamma   90.00000000
_symmetry_Int_Tables_number   225
_chemical_formula_structural   LiF
_chemical_formula_sum   'Li4 F4'
_cell_volume   68.08861619
_cell_formula_units_Z   4
loop_
 _symmetry_equiv_pos_site_id
 _symmetry_equiv_pos_as_xyz
  1  'x, y, z'
  2  '-x, -y, -z'
  3  'z, y, -x'
  4  '-z, -y, x'
  5  '-x, y, -z'
  6  'x, -y, z'
  7  '-z, y, x'
  8  'z, -y, -x'
  9  'x, -y, -z'
  10  '-x, y, z'
  11  'z, -y, x'
  12  '-z, y, -x'
  13  '-x, -y, z'
  14  'x, y, -z'
  15  '-z, -y, -x'
  16  'z, y, x'
  17  'y, -z, -x'
  18  '-y, z, x'
  19  'y, x, -z'
  20  '-y, -x, z'
  21  'y, z, x'
  22  '-y, -z, -x'
  23  'y, -x, z'
  24  '-y, x, -z'
  25  '-y, z, -x'
  26  'y, -z, x'
  27  '-y, -x, -z'
  28  'y, x, z'
  29  '-y, -z, x'
  30  'y, z, -x'
  31  '-y, x, z'
  32  'y, -x, -z'
  33  '-z, x, -y'
  34  'z, -x, y'
  35  'x, z, -y'
  36  '-x, -z, y'
  37  'z, -x, -y'
  38  '-z, x, y'
  39  '-x, -z, -y'
  40  'x, z, y'
  41  'z, x, y'
  42  '-z, -x, -y'
  43  '-x, z, y'
  44  'x, -z, -y'
  45  '-z, -x, y'
  46  'z, x, -y'
  47  'x, -z, y'
  48  '-x, z, -y'
  49  'x+1/2, y+1/2, z'
  50  '-x+1/2, -y+1/2, -z'
  51  'z+1/2, y+1/2, -x'
  52  '-z+1/2, -y+1/2, x'
  53  '-x+1/2, y+1/2, -z'
  54  'x+1/2, -y+1/2, z'
  55  '-z+1/2, y+1/2, x'
  56  'z+1/2, -y+1/2, -x'
  57  'x+1/2, -y+1/2, -z'
  58  '-x+1/2, y+1/2, z'
  59  'z+1/2, -y+1/2, x'
  60  '-z+1/2, y+1/2, -x'
  61  '-x+1/2, -y+1/2, z'
  62  'x+1/2, y+1/2, -z'
  63  '-z+1/2, -y+1/2, -x'
  64  'z+1/2, y+1/2, x'
  65  'y+1/2, -z+1/2, -x'
  66  '-y+1/2, z+1/2, x'
  67  'y+1/2, x+1/2, -z'
  68  '-y+1/2, -x+1/2, z'
  69  'y+1/2, z+1/2, x'
  70  '-y+1/2, -z+1/2, -x'
  71  'y+1/2, -x+1/2, z'
  72  '-y+1/2, x+1/2, -z'
  73  '-y+1/2, z+1/2, -x'
  74  'y+1/2, -z+1/2, x'
  75  '-y+1/2, -x+1/2, -z'
  76  'y+1/2, x+1/2, z'
  77  '-y+1/2, -z+1/2, x'
  78  'y+1/2, z+1/2, -x'
  79  '-y+1/2, x+1/2, z'
  80  'y+1/2, -x+1/2, -z'
  81  '-z+1/2, x+1/2, -y'
  82  'z+1/2, -x+1/2, y'
  83  'x+1/2, z+1/2, -y'
  84  '-x+1/2, -z+1/2, y'
  85  'z+1/2, -x+1/2, -y'
  86  '-z+1/2, x+1/2, y'
  87  '-x+1/2, -z+1/2, -y'
  88  'x+1/2, z+1/2, y'
  89  'z+1/2, x+1/2, y'
  90  '-z+1/2, -x+1/2, -y'
  91  '-x+1/2, z+1/2, y'
  92  'x+1/2, -z+1/2, -y'
  93  '-z+1/2, -x+1/2, y'
  94  'z+1/2, x+1/2, -y'
  95  'x+1/2, -z+1/2, y'
  96  '-x+1/2, z+1/2, -y'
  97  'x+1/2, y, z+1/2'
  98  '-x+1/2, -y, -z+1/2'
  99  'z+1/2, y, -x+1/2'
  100  '-z+1/2, -y, x+1/2'
  101  '-x+1/2, y, -z+1/2'
  102  'x+1/2, -y, z+1/2'
  103  '-z+1/2, y, x+1/2'
  104  'z+1/2, -y, -x+1/2'
  105  'x+1/2, -y, -z+1/2'
  106  '-x+1/2, y, z+1/2'
  107  'z+1/2, -y, x+1/2'
  108  '-z+1/2, y, -x+1/2'
  109  '-x+1/2, -y, z+1/2'
  110  'x+1/2, y, -z+1/2'
  111  '-z+1/2, -y, -x+1/2'
  112  'z+1/2, y, x+1/2'
  113  'y+1/2, -z, -x+1/2'
  114  '-y+1/2, z, x+1/2'
  115  'y+1/2, x, -z+1/2'
  116  '-y+1/2, -x, z+1/2'
  117  'y+1/2, z, x+1/2'
  118  '-y+1/2, -z, -x+1/2'
  119  'y+1/2, -x, z+1/2'
  120  '-y+1/2, x, -z+1/2'
  121  '-y+1/2, z, -x+1/2'
  122  'y+1/2, -z, x+1/2'
  123  '-y+1/2, -x, -z+1/2'
  124  'y+1/2, x, z+1/2'
  125  '-y+1/2, -z, x+1/2'
  126  'y+1/2, z, -x+1/2'
  127  '-y+1/2, x, z+1/2'
  128  'y+1/2, -x, -z+1/2'
  129  '-z+1/2, x, -y+1/2'
  130  'z+1/2, -x, y+1/2'
  131  'x+1/2, z, -y+1/2'
  132  '-x+1/2, -z, y+1/2'
  133  'z+1/2, -x, -y+1/2'
  134  '-z+1/2, x, y+1/2'
  135  '-x+1/2, -z, -y+1/2'
  136  'x+1/2, z, y+1/2'
  137  'z+1/2, x, y+1/2'
  138  '-z+1/2, -x, -y+1/2'
  139  '-x+1/2, z, y+1/2'
  140  'x+1/2, -z, -y+1/2'
  141  '-z+1/2, -x, y+1/2'
  142  'z+1/2, x, -y+1/2'
  143  'x+1/2, -z, y+1/2'
  144  '-x+1/2, z, -y+1/2'
  145  'x, y+1/2, z+1/2'
  146  '-x, -y+1/2, -z+1/2'
  147  'z, y+1/2, -x+1/2'
  148  '-z, -y+1/2, x+1/2'
  149  '-x, y+1/2, -z+1/2'
  150  'x, -y+1/2, z+1/2'
  151  '-z, y+1/2, x+1/2'
  152  'z, -y+1/2, -x+1/2'
  153  'x, -y+1/2, -z+1/2'
  154  '-x, y+1/2, z+1/2'
  155  'z, -y+1/2, x+1/2'
  156  '-z, y+1/2, -x+1/2'
  157  '-x, -y+1/2, z+1/2'
  158  'x, y+1/2, -z+1/2'
  159  '-z, -y+1/2, -x+1/2'
  160  'z, y+1/2, x+1/2'
  161  'y, -z+1/2, -x+1/2'
  162  '-y, z+1/2, x+1/2'
  163  'y, x+1/2, -z+1/2'
  164  '-y, -x+1/2, z+1/2'
  165  'y, z+1/2, x+1/2'
  166  '-y, -z+1/2, -x+1/2'
  167  'y, -x+1/2, z+1/2'
  168  '-y, x+1/2, -z+1/2'
  169  '-y, z+1/2, -x+1/2'
  170  'y, -z+1/2, x+1/2'
  171  '-y, -x+1/2, -z+1/2'
  172  'y, x+1/2, z+1/2'
  173  '-y, -z+1/2, x+1/2'
  174  'y, z+1/2, -x+1/2'
  175  '-y, x+1/2, z+1/2'
  176  'y, -x+1/2, -z+1/2'
  177  '-z, x+1/2, -y+1/2'
  178  'z, -x+1/2, y+1/2'
  179  'x, z+1/2, -y+1/2'
  180  '-x, -z+1/2, y+1/2'
  181  'z, -x+1/2, -y+1/2'
  182  '-z, x+1/2, y+1/2'
  183  '-x, -z+1/2, -y+1/2'
  184  'x, z+1/2, y+1/2'
  185  'z, x+1/2, y+1/2'
  186  '-z, -x+1/2, -y+1/2'
  187  '-x, z+1/2, y+1/2'
  188  'x, -z+1/2, -y+1/2'
  189  '-z, -x+1/2, y+1/2'
  190  'z, x+1/2, -y+1/2'
  191  'x, -z+1/2, y+1/2'
  192  '-x, z+1/2, -y+1/2'
loop_
 _atom_site_type_symbol
 _atom_site_label
 _atom_site_symmetry_multiplicity
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_occupancy
  Li  Li0  4  0.000000  0.000000  0.000000  1
  F  F1  4  0.000000  0.000000  0.500000  1
$\endgroup$
1

4 Answers 4

9
$\begingroup$

Your question does not detail out what you mean by how is it done. Indeed, your question may be answered twice:

  • determine the distances between atoms, pairwise compare the sum of van der Waals radii; if the distance is equal or less to a threshold (previously extracted from experiments, tabulated e.g., in the International Tables of X-ray crystallography); as in @Greg's answer. or

  • given a .cif file, I need to establish a file with a connection table and with information about bond orders. In this case, cod-tools by the Crystallography Open Database, freely available (e.g., repackaged for Linux Debian and related distributions like Ubuntu (tracker for Debian)) contain a nice tool codcif2sdf running on the command line to convert a .cif into a .sdf file.

    Note that some databases (e.g., CSD by CCDC) allow the download of their data not only as .cif, yet equally in formats like .mol2 (reference), .sdf, .pdb, or SMILES strings from their more advanced interfaces (in case of CSD, conquest, or their Python API (an example).


The content of your example .cif file was copied into file test.cif. Because the data are not the result of a X-ray diffraction experiment with subsequent structure solution and structure refinement, one may pass some of the problems listed by checkcif, though PLAT113_ALERT_2_B should not be dropped.

The file was processed by codcif2sdf test.cif > test.sdf to yield a .sdf file. Jmol, running with the automatic computation of bonds (cf. the GUI, Edit -> Preferences -> Bonds) will assign a bond, while the .sdf file lacks the connectivity table typically seen for organic molecules.

At the level of codcif2sdf, in comparison to performing such a transformation for data about an organic molecule like benzene, the example of $\ce{LiF}$ might indicate a lack of encoded data. However, what do the struts drawn between the atoms represent? In crystallography, they indicate that the distance between two atoms is less than the sum of their van der Waals radii. As an example, see the drawing in this question on chemistry.se, and an answer tangentially relevant here:

enter image description here

In other fields, where you write one, or multiple dashes between two atoms, you state that there is non-zero electron density between the two atoms. This then covalent bond, two atoms participate (in an analogy, share with each other). This directed bond may be polarized, i.e. with one partner participating more in the electron density, than the other. However, this contrasts to $\ce{LiF}$ with a difference of electronegativity so large that one partner (here: $\ce{F}$) practically withdrew all electron density of this bond, where the other (here $\ce{Li}$) practically gave up all of its share by valence electrons.

As a result, $\ce{LiF}$ is not a covalent molecule, but an ionic salt. If you lower the level of description, and describe electrons as countable spheres, it consists of which consists of $\ce{Li^+}$ and $\ce{F^-}$. Their interactions in the sample resembles those of point charges. Have a look at illustrations in my answer here for a comparison of e.g., $\ce{N#N}$, $\ce{NaCl}$, and $\ce{Cu}$.

$\endgroup$
6
  • $\begingroup$ I am definitely looking for an actual implementation that I can use. odcif2sdf does not seem to work though. It produces "ERROR, the maximum number of polymer atom repetitions 100 was hit for the atom O11 (1), to get around this limit, please increase --max-polymer-atoms, to say, --max-polymer-atoms=200 or decrease --max-polymer-span (e.g. --max-polymer-span=2)", only to then throw a "unknown option '--max-polymer-atoms=200'." error when I try to do that. $\endgroup$
    – maxbear123
    Commented Feb 2, 2022 at 1:37
  • 1
    $\begingroup$ codcif2sdf (presumably) was written with the small molecule models on COD encoded in .cif in mind and in my experience, works reasonably well, especially for entries about organic molecules without positional/orientational disorder. The .mmcif format you use is different to the of .cif (wasn't mentioned in your question). A test file of Jmol however is processed by OpenBabel obabel -immcif 114D.mmcif -ocif -O test.cif and codcif2sdf test.cif > test.sdf with some success (552 atoms). $\endgroup$
    – Buttonwood
    Commented Feb 2, 2022 at 8:30
  • 1
    $\begingroup$ So far, I did not deal to rewrite .mmcif into other formats. However, would be a service like mmcif.pdbj.org/converter a suitable converter for you? $\endgroup$
    – Buttonwood
    Commented Feb 2, 2022 at 8:34
  • 1
    $\begingroup$ @maxbear123 One option may be to edit the original question and to add the problematic file's content in a code block (enclosed by three grave signs in the line to start, three grave signs in the line after to end this). Alternatively, to add in the OP a link which points to a reference outside chemistry.se (e.g., on zenodo, or pastebin). Then, it is easier for the interested to attempt a conversion to yield, e.g. a .sdf file. $\endgroup$
    – Buttonwood
    Commented Feb 2, 2022 at 22:48
  • 1
    $\begingroup$ @maxbear123 For clarification, I edited the answer. The .sdf is deposit on pastebin.com, with a minor edit of the time stamp. $\endgroup$
    – Buttonwood
    Commented Feb 3, 2022 at 14:52
7
$\begingroup$

You didn't ask about macromolecules, but I'll write about it anyway.

Macromolecular CIF files (mmCIF) also don't contain bonds. This information is stored in separate dictionaries. In particular, the PDB maintains Chemical Component Dictionary – a huge CIF file which contains, among other things, a list of bonds for each residue and small molecule found in PDB entries.

Connectivity between monomers in a polymer can be inferred from the sequence information.
Other connections are listed explicitly in the struct_conn category.

It'd possible to just use distances between atoms pairs, but this is less reliable – atomic coordinates in macromolecules may not be precisely determined.

$\endgroup$
1
  • 2
    $\begingroup$ +1 for commenting on the component dictionary for mmCIF. $\endgroup$ Commented Feb 2, 2022 at 4:39
6
$\begingroup$

The most software find bonds in a structure from cartesian coordinates by finding atom pairs that are closer than a certain threshold distance (something shorter than the vdw radius of the given atoms). Generally, you can set this distance yourself if you need for some reason.

$\endgroup$
6
$\begingroup$

how could one algorithmically find all bond pairs

Here is a sketch of an algorithm that takes crystal symmetry into account:

  1. Move all atoms into a single asymmetric unit
  2. Apply all symmetry operators (including centering) to the asymmetric unit to generate the content of the unit cell
  3. Duplicate the unit cell content to also create the units cell contents on the left, right, top, bottom, front, back of the unit cell.
  4. For each atom in the asymmetric unit, find the neighbors within bonding distance, using a fixed-radius near neighbors algorithm and appropriate distance cutoff.
  5. Filter the list of neighboring pairs to exclude duplicates and unrealistic geometries.

This sketch of an algorithm might be slow for large structures, and there might be simple optimizations, especially for spacegroups where the asymmetric unit is a rectangular prism along the cartesian axes.

Example

enter image description here

The CIF file for sodium chloride contains two atoms, one sodium (purple) on the origin and one chloride (green) in the center of the unit cell. The crystal is face-centered. To find the closest neighbors of the chloride, you have to generate all the sodium ions in or touching the unit cell, using the centering operation and various unit-cell vector translations. Then, you can find the six sodium ions that are closest neighbors of the chloride ion (above, below, left, right, in front and behind the chloride ion on the faces of the unit cell).

To find which chloride ions make up the (inner) coordination sphere of the sodium ion, you have to generate the symmetry mates of the chloride ion within the unit cell. Then, you have to translate the unit cell in multiple directions (and combinations of directions) to make sure you also find bonding partners outside of the unit cell you started with.

$\endgroup$
3
  • $\begingroup$ The VESTA program allows one to try different cutoff distances for the bonds it displays. Sometimes it has a distance by default (that works and produces plausible bonds). Other times it shows no bonds (and inputting any cuttoff distance produces clearly inplausible bond pairs). How does one pick an "appropriate distance cutoff"? $\endgroup$
    – maxbear123
    Commented Feb 2, 2022 at 1:54
  • 1
    $\begingroup$ @maxbear123 Jmol (the program used in for illustration) equally has this option. In the GUI, Edit -> Preferences -> Bonds allows you to switch off/on the automatic computation of bonds (if these are not stored in a connectivity table of the file read, like in a .sdf file (and contrasting to .cif/.xyz)). It then allows you to adjust (independently of each other) a minimal bond bond distance (0 to 1 Angstrom), and a bond tolerance (sum of van der Waals radii of the atoms in question + x) equally with an x in range of 0 to 1 Angstrom. $\endgroup$
    – Buttonwood
    Commented Feb 2, 2022 at 8:39
  • 1
    $\begingroup$ @maxbear123 "Chemical bonds" make the most sense in covalent compounds, but you are most probably dealing with inorganic (ionic or metallic ) structures, right? In those materials connecting atoms is not much more than an illustration of close interactions, but you shouldn't take it literally. $\endgroup$
    – Greg
    Commented Feb 3, 2022 at 4:54

Not the answer you're looking for? Browse other questions tagged or ask your own question.