TUCAN is a canonical serialisation format that is independent of domain-specific concepts of structure and bonding.
The atomic number is the only chemical feature that is used to derive the TUCAN format. Other than that, the format is solely based on the molecular topology. The serialisation procedure generates a canonical “tuple-style” output which is bidirectional, allowing the TUCAN string to serve as both identifier and descriptor. Use of the Python NetworkX graph library facilitated a compact and easily extensible implementation. Now, an online version (Figure 1) is presented where chemists can have a look and try to convert their molecules into the TUCAN format, either from drawing directly or from a mol-file.
Furthermore, the work on the TUCAN identifier and descriptor has just been published in the Journal of Cheminformatics.