RWTH Aachen, May 10 and 11, 2023
For the next release the developers of InChI are working on a revision of the chemical representation of inorganics and organometallics. The Goal of this InChI Workshop was the identification of methods that let InChI handle the stereochemistry of organometallic and coordination compounds that has not been touched up to the current version.
A chemical identifier
Within his talk “My journey from nomenclature and terminology to data infrastructures”
or “What can we learn from old line notation systems?” set Jeremy G. Frey (University of Southampton) the framework where InChI has to find itself as chemical identifier within the universe of chemical compounds, the history of chemical representations from Wiswesser line notations to Smiles, the needs of FAIR data handling and the requirements of AI/ML.
In his talk “Polyhedral Symbols, Configuration Indices, and What can be Learned from Applying Classical Nomenclature” Richard Hartshorn (University of Canterbury, New Zeeland) described the requirements a chemical identifier has to fulfil to become generally accepted within the community of chemists based on his experiences with the development of the IUPAC nomenclature for inorganics and organometallics. By describing the way from a separated naming convention for ligands, the geometries around the central atom and the unique ordering of ligands around the centre by using CIP priorities he came up to the problematic identification of higher order symmetries that do not differ a lot from each other.
Based on Richard Hartshorn’s blueprint, Andrey Yerin (ACD/Labs) showed in his talk titled “Generation of systematic stereodescriptors for coordination structures and possible application for InChI” how ACD/Labs has implemented the rules into their automatic IUPAC naming libraries that are widely used by other software vendors as well. He pointed out that chemical name generation is a very challenging area in need of further work and improvements and mentioned that it would be easier to replace systematic names with InChI, but they are still demanded and used in various areas.
Each coordination type is handled by its own rule set that derives the configuration descriptors specified by the priority of the ligands. The priority of ligands is explained with hierarchical digraphs based on the CIP rules. The recognition of specific coordination types is mostly based on the detection of axes taking presence of stereobonds and their spatial arrangements into account. Beside 2D structures the configuration of 2.5D and 3D structures are also recognized.
Based on his experiences he has proposed ways how the concepts can be transferred into the InChI environment based on the priority of the ligand atoms and their geometry around the central atom. But already the analysis of cis-platinum has shown that the unique atom numbering of the 4 donor atoms based on the InChI atom numbers may not be sufficient for the unique representation of this chemical complex. That means the current InChI numbering is not stereospecific and does not recognize diastereotopic groups.
Implementation of the stereochemistry
For the implementation of the stereochemistry for organometallics and inorganic compounds Andrey Yerin proposed the following requirements:
- Establish a format that defines the configuration and chirality
- Support of different representations of coordination bonds (coordinative bonds, haptic bonds, etc.)
- Recognition of the coordination types like e. g. T-4, SP-4, TBPY-5, OC-6
- Recognition of spatial arrangements in 2D, 2.5D and 3D
- Procedures to determine configurationally inequivalent ligands
- Bond model for delocalized ligands – in general and in relation to configuration
Ulrich Schatzschneider (University of Würzburg, Germany) proposed a graph-based implementation of the stereochemistry of organometallics and inorganic compounds based on “Stereographs”. The coordination polyhedron is represented as undirected graph with atoms as nodes and the atom-atom connection vectors as edges. Accordingly, all nodes are edges but the links between atoms that are not directly bound become edges as well. Nodes are found inside, outside, or at corners of polyhedrons, the graph edges are located inside, outside, within faces, or at the edges of polyhedrons. The geometry of the organometallic compound is integrated into the related polyhedrons. For each of the polyhedron types Ulrich Schatzschneider described the way to a unique representation of the stereochemistry. For these representations he uses a tuple format based on the atom numbers building the nodes of the actual polyhedron.
The concept of stereographs goes into the TUCAN – an identifier and descriptor for all domains of chemistry.
In “Stereoselective organometallic catalysts – how do we identify them in InChi?” Per-Ola Norrby’s (R&D, AstraZeneca, Gothenburg) follows the stereochemistry definitions of InChi by proposing the following rules:
- Encode geometric isomerism by denoting trans pairs in new layer
- Differentiate enantiomers by encoding central atom chirality in /t layer
- Represent π-systems by assigning one coordination position to centroid
- Rigidly coordinated π-systems also need trans assignment
The geometry is described by tuples of atom numbers beginning with the atom 1 (as highest atomic number) and its trans counterpart. It is followed by the atom pair with the next priority. Atoms in π-systems are listed in brackets with hyphens as separators using increasing atom numbers.
Inorganic Stereochemistry Ecosystem
John Mayfield (NextMove Software, Cambridge, UK) gave a general overview to the stereochemistry representation of inorganic systems in his talk “Inorganic Stereochemistry Ecosystem”. For the stereochemistry of higher order systems, he proposes to transfer the idea of permutation tables for tetrahedral centres to higher symmetries. A tetrahedral centre consists out of 2 parities that let you order the 24 permutations into 2 groups of 12 permutations each, you may order a given permutation to one of the groups so that you receive @TH1 or @TH2 depending on the assignment.
In case of the octahedral geometry, you get 30 different groups if all ligands differ from each other. If two or more ligands are identical the number of potential groups is reduced accordingly but still can be related to one of the octahedral groups. In these cases, octahedral groups become identical and the lowest one is seen to be the canonical representation. Geometries that are lower than that one of octahedral centres (e. g. square planar, square pyramidal, etc.) can be described as octahedral degenerates. That let their configuration be described by the related octahedral group as well.
Christoph Bannwarth and Nils van Staalduinen (RWTH Aachen) presented in their contribution “Molecular Barcode” the new molecular identifier MolBar for organic and inorganic chemistry molecules with full support of stereoisomerism. Molecular graphs can be represented as Hückel inspired extended adjacency matrices whose spectrum (= set of ordered eigenvalues) is a priori permutation invariant. That becomes the base of the “molbar” that is further refined by the introduction of Coulomb matrices, fragmentations of the matrix into rigid molecular parts, and an additional structure unification using a specialized force field to uniquely handle the stereochemistry. That makes MolBar especially suitable to handle he stereochemistry of inorganics and organometallics.
In the beginning of the workshop Djordje Baljozovic, Jan Brammer, Nauman Kahn, and Frank Lage presented the status of the technical developments all around the InChI code. That included the test environments for test batches and in Web browsers.
During the discussion Thomas Dörner came up with the idea to use spherical projections to represent the stereochemistry of inorganics. In the further discussion the introduction of polar coordinates around the centre of the chemical complex has been brought up and is further investigated.
The organometallics and inorganic working group of the IUPAC subcommittee is consolidating the results of this workshop. It will have to define the syntax for the stereochemistry representation of inorganics within the InChI string as next step.