Five years of just hoping?

Did you know? Hard disks have a maximum warranty of 5 years. But the DFG requires 10 years for data storage. 

Did you know? Hard disks have a maximum warranty of 5 years. But the DFG requires 10 years for data storage. Do the math. - NFDI4Chem

Do the math: that can’t work. So if you rely only on your PC hard drive, you’ll soon end up in trouble. Yet in our recent survey of chemists, 69% (down from 75% in 2019) still said they use personal or shared computers for long-term data storage.

In today’s digital world, safe data storage is indispensable because data is the digital currency of today’s professional world. There are numerous reasons why you should not solely rely on hard disks or similar storage media:

  • Hard disk defects are one of the most common causes of data loss. Every hard disk has a limited lifespan, even high-quality drives can suddenly fail due to overheating, software errors, partition errors. And even the warranty will only get you a new disk, but not return your lost data.
  • USB flash drives may seem convenient, but they are small, easily misplaced, and prone to physical damage. If you store important files only on a USB stick and it gets lost or damaged, your data will be irretrievably lost. 
  • Cloud solutions are undoubtedly convenient and offer the advantage of remote access to your data. Nevertheless, cloud providers can suffer security breaches that lead to data leaks. In addition, there is the risk that the provider could discontinue its service. 

This can have a disastrous impact on your research work or funding.

And the solution? 

A well-thought-out data strategy can ensure that your data is safe even when your hard drive gives up the ghost.

The best way to do this is to create a thoroughly thought out data management plan (DMP). You can find all the important info about it here: Data Management Plan.

If you need assistance, sign up for our Research Data Management Basic Workshop.

During research

During the research phase, you need robust data storage. To do this, create backups on different storage media to ensure that your data is backed up in multiple locations.

The 3-2-1 rule is best: 3 copies on 2 different storage media with 1

backup to an off-site location. 

After research

After the research or publication is complete, you need a sound strategy for providing the data for publication and ensuring its archiving in line with funder requirements. You solve these demands by storing them in a repository.  Info about this can be found here.

By the way, this is a good opportunity to store data in a FAIR way. FAIR? FAIR.

Remember: it’s not a question of if data loss will occur, but when. 

Be prepared.

Another ELN made available for testing

The website https://demo.chemotion.scc.kit.edu/ provides a collection of test versions of various electronic laboratory notebooks (ELNs). These test versions are of great value when it comes to choosing an optimal ELN solution for scientific laboratories. 

The advantage of these trial ELNs is that they allow researchers and laboratory managers to test the different ELN systems and evaluate them for their specific needs. Ease of use, functionality and seamless integration into existing laboratory workflows can be tested. 

A new addition is the ELN called OpenBIS. With this, five different ELNs are now available for testing, helping researchers make an informed choice when selecting the ELN system best suited to their requirements.

Screenshot of the test-eln website

This is no Data

The importance of raw data publishing and the FAIR criteria

Picture of an NMR-Spectre with Text under it: "This is no data".

Many people know it, the world famous painting “Ceci n’est pas une pipe“, an iconic work by René Magritte from 1929, which shows the image of a pipe and the written statement: “Ceci n’est pas une pipe”, meaning “This is not a pipe”. This phrase draws attention to the fact that the image itself is not the physical pipe, but only an image or representation of it. Thus, the image challenges the viewer to engage with the relationship between reality and its representatives.

In science, this discussion is sometimes missing. In publications, the underlying measurement data are usually attached in the form of graphs. Many think they have shared their data with it. But they havn’t. Graphs are representations of the data, they allow other researchers to understand the publication more deeply. But they cannot use them for further work.

Raw data are the original, unfiltered data collected directly from the instruments, such as the NMR spectrometer. They contain information that allows the data to be evaluated and analyzed, possibly in the future using other methods (e.g., AI).

The FAIR criteria (Findable, Accessible, Interoperable, Reusable) emphasize the importance of publishing raw data in scientific research. By making their raw data publicly available, researchers promote

  • the transparency and reproducibility of studies, and thus
  • the trust in the research results
  • scientific integrity,
  • opportunities for collaboration,
  • new insights / potentially unexpected discoveries, and
  • access for future technologies.

According to a study by TIB, Open Access publications have significant advantages. Also, reservations could be empirically refuted. The future of science is FAIR. Be part of it.

RDM-Workshop in Stuttgart – soon at your institute?

Did you know that you can book a free in-house workshop on research data management at NFDI4Chem, e.g. for two days?

Our last RDM-Workshop took place in mid-June in Stuttgart, at CRC 1333. There, our two trainers taught a total of 24 participants on the topic of “Fair RDM: Basics for chemists”.

The workshop can also include content that has been previously agreed with the Institute. In Stuttgart, for example, our partner Prof. Dr. Jürgen Pleiss gave a lecture on a bottom-up concept of an RDM and many best practices from CRC 1333 were taken up. In addition, ELNs already available in CRC1333 were tested together with ELN experts.

Are you also interested? We have a permanent team of 4 trainers, train groups of up to 100 people and can implement ready-made concepts as well as individual arrangements. The duration of the workshop can be flexibly adapted to your needs.

If you are interested, simply contact us.

NFDI4Chem at ACS Spring and Fall

Talks, Workshops and Networking in Indianapolis and San Francisco.

Earlier this year, NFDI4Chem attended the American Chemical Society (ACS) Spring Meeting in Indianapolis (March 26-30). It was great to see so many experts from around the world at the sessions organised by the ACS Division of Chemical Information (CINF). NFDI4Chem gave presentations to inform the wider community, but was also able to expand its international network of chemical information management stakeholders. The workshops organised by IUPAC proved to be particularly productive.

Fall Meeting

We are particularly looking forward to the ACS Fall Meeting in San Francisco (August 13-17), as the theme of the entire congress is “harnessing the power of data”, which means that this year there will be a special focus on chemical data issues. NFDI4Chem is co-chairing two sessions at the fall meeting: “Helping chemists manage their data” and “Enhancing your data – smart ways to use metadata and knowledge graphs”. 

Logo of ACS Fall Meeting 2023

In the first symposium, researchers will tackle challenges related to data management and publication, including the adoption of electronic lab notebooks (ELNs). Manual data collection, errors in data generation, and limited accessibility to matched datasets will be explored. The symposium aims to address these issues and promote the implementation of FAIR principles, overcoming cultural and psychosocial barriers, considering factors such as training, workload, funding, and publishers’ requirements. It will also delve into topics such as ELN integration with instruments, data processing software, transferring data to repositories, and developer/researcher practices in data acquisition and storage.

The second symposium will explore various aspects of data and metadata, including formats, standards and terminologies, to enable the creation and use of machine-actionable data in applications such as electronic lab notebooks and knowledge graphs. Rich metadata and ontologies are essential to provide context for meaningful knowledge generation from data.

Please join us either in person or virtually to learn about and/or improve tools & standards for data management in chemistry: www.acs.org/meetings/acs-meetings/fall-2023.html 

Relaunch of NFDI4Chem Knowledge Base

The NFDI4Chem Knowledge Base has been restructured and now also includes a separate section on data publication.

We are proud to announce the launch of the restructured NFDI4Chem Knowledge Base

Screenshot of the new NFDI4Chem Knowledge base.

Entry Points

You can now enter the Knowledge Base via six different entry points:

  • Choose Get started to jump to the introduction, with child pages on the FAIR Data Principles as well as the Data Life Cycle. 
  • Select the Domains category and you will be led to selected sub-disciplines of chemistry. The articles each contain key information on analytical methods employed and how to deal with the data they produce in a FAIR manner. 
  • Choose Roles such as group leader or student to find out what topics are particularly relevant depending on your respective role.
  • In Handling Data, you will find information on data management plans and data organisation, documentation as well as storage and archiving. 
  • Electronic Lab Notebooks will lead you to the concept of the Smart Lab, including child pages on ELNs and Chemotion ELN.
  • The completely new entry point, Data Publication, introduces the concept and benefits of research data publishing and highlights two primary methods for data publication. Dedicated child pages on research data repositories assist you in understanding and choosing the right repository for your research data.. Further pages provide information on data articles, data availability statements, and machine-readable chemical structures. The best practice page now also includes information on how to use dataset persistent identifiers (PIDs) in scientific articles.

Further features

A very brief overview of recommended, trusted, and chemistry-friendly repositories was squeezed out of our Knowledge Base and is now also accessible in an updates page on www.nfdi4chem.de on chemistry repositories.

The Knowledge Base pages are all curated in a GitHub repository and everyone is welcome to contribute to the NFDI4Chem Knowledge Base! All articles so far have been authored by an appropriate expert and subsequently reviewed by our editorial team. 

Should you wish to author a new article or contribute to an existing one, please get in touch with us via helpdesk@nfdi4chem.de.

We have been requested by the chemistry community in Germany to provide a German-language edition of the NFDI4Chem Knowledge Base, and we are pleased to announce that it will be available soon.

Stay tuned! 

InChI Workshop on Inorganic Stereochemistry

RWTH Aachen, May 10 and 11, 2023

Organised by Sonja Herres-Pawlis (RWTH Aachen, Germany)
and Gerd Blanke (InChI Trust)

For the next release the developers of InChI are working on a revision of the chemical representation of inorganics and organometallics. The Goal of this InChI Workshop was the identification of methods that let InChI handle the stereochemistry of organometallic and coordination compounds that has not been touched up to the current version.

A chemical identifier

Within his talk “My journey from nomenclature and terminology to data infrastructures”
or “What can we learn from old line notation systems?” set Jeremy G. Frey (University of Southampton) the framework where InChI has to find itself as chemical identifier within the universe of chemical compounds, the history of chemical representations from Wiswesser line notations to Smiles, the needs of FAIR data handling and the requirements of AI/ML. 

Requirements

In his talk “Polyhedral Symbols, Configuration Indices, and What can be Learned from Applying Classical Nomenclature” Richard Hartshorn (University of Canterbury, New Zeeland) described the requirements a chemical identifier has to fulfil to become generally accepted within the community of chemists based on his experiences with the development of the IUPAC nomenclature for inorganics and organometallics.  By describing the way from a separated naming convention for ligands, the geometries around the central atom and the unique ordering of ligands around the centre by using CIP priorities he came up to the problematic identification of higher order symmetries that do not differ a lot from each other. 

library implementation

Based on Richard Hartshorn’s blueprint, Andrey Yerin (ACD/Labs) showed in his talk titled “Generation of systematic stereodescriptors for coordination structures and possible application for InChI” how ACD/Labs has implemented the rules into their automatic IUPAC naming libraries that are widely used by other software vendors as well. He pointed out that chemical name generation is a very challenging area in need of further work and improvements and mentioned that it would be easier to replace systematic names with InChI, but they are still demanded and used in various areas.

Each coordination type is handled by its own rule set that derives the configuration descriptors specified by the priority of the ligands. The priority of ligands is explained with hierarchical digraphs based on the CIP rules. The recognition of specific coordination types is mostly based on the detection of axes taking presence of stereobonds and their spatial arrangements into account. Beside 2D structures the configuration of 2.5D and 3D structures are also recognized.

Based on his experiences he has proposed ways how the concepts can be transferred into the InChI environment based on the priority of the ligand atoms and their geometry around the central atom. But already the analysis of cis-platinum has shown that the unique atom numbering of the 4 donor atoms based on the InChI atom numbers may not be sufficient for the unique representation of this chemical complex. That means the current InChI numbering is not stereospecific and does not recognize diastereotopic groups.

Implementation of the stereochemistry

For the implementation of the stereochemistry for organometallics and inorganic compounds Andrey Yerin proposed the following requirements:

  • Establish a format that defines the configuration and chirality
  • Support of different representations of coordination bonds (coordinative bonds, haptic bonds, etc.)
  • Recognition of the coordination types like e. g. T-4, SP-4, TBPY-5, OC-6 
  • Recognition of spatial arrangements in 2D, 2.5D and 3D 
  • Procedures to determine configurationally inequivalent ligands
  • Bond model for delocalized ligands – in general and in relation to configuration

Graph-based implementation

Ulrich Schatzschneider (University of Würzburg, Germany) proposed a graph-based implementation of the stereochemistry of organometallics and inorganic compounds based on “Stereographs”. The coordination polyhedron is represented as undirected graph with atoms as nodes and the atom-atom connection vectors as edges. Accordingly, all nodes are edges but the links between atoms that are not directly bound become edges as well. Nodes are found inside, outside, or at corners of polyhedrons, the graph edges are located inside, outside, within faces, or at the edges of polyhedrons. The geometry of the organometallic compound is integrated into the related polyhedrons. For each of the polyhedron types Ulrich Schatzschneider described the way to a unique representation of the stereochemistry. For these representations he uses a tuple format based on the atom numbers building the nodes of the actual polyhedron.

TUCAN

The concept of stereographs goes into the TUCAN – an identifier and descriptor for all domains of chemistry.

In “Stereoselective organometallic catalysts – how do we identify them in InChi?” Per-Ola Norrby’s (R&D, AstraZeneca, Gothenburg) follows the stereochemistry definitions of InChi by proposing the following rules:

  • Encode geometric isomerism by denoting trans pairs in new layer
  • Differentiate enantiomers by encoding central atom chirality in /t layer
  • Represent π-systems by assigning one coordination position to centroid
  • Rigidly coordinated π-systems also need trans assignment

The geometry is described by tuples of atom numbers beginning with the atom 1 (as highest atomic number) and its trans counterpart. It is followed by the atom pair with the next priority. Atoms in π-systems are listed in brackets with hyphens as separators using increasing atom numbers.

Inorganic Stereochemistry Ecosystem

John Mayfield (NextMove Software, Cambridge, UK) gave a general overview to the stereochemistry representation of inorganic systems in his talk “Inorganic Stereochemistry Ecosystem”. For the stereochemistry of higher order systems, he proposes to transfer the idea of permutation tables for tetrahedral centres to higher symmetries. A tetrahedral centre consists out of 2 parities that let you order the 24 permutations into 2 groups of 12 permutations each, you may order a given permutation to one of the groups so that you receive @TH1 or @TH2 depending on the assignment.

In case of the octahedral geometry, you get 30 different groups if all ligands differ from each other. If two or more ligands are identical the number of potential groups is reduced accordingly but still can be related to one of the octahedral groups. In these cases, octahedral groups become identical and the lowest one is seen to be the canonical representation. Geometries that are lower than that one of octahedral centres (e. g. square planar, square pyramidal, etc.)  can be described as octahedral degenerates. That let their configuration be described by the related octahedral group as well. 

MolBar

Christoph Bannwarth and Nils van Staalduinen (RWTH Aachen) presented in their contribution “Molecular Barcode” the new molecular identifier MolBar for organic and inorganic chemistry molecules with full support of stereoisomerism. Molecular graphs can be represented as Hückel inspired extended adjacency matrices whose spectrum (= set of ordered eigenvalues) is a priori permutation invariant. That becomes the base of the “molbar” that is further refined by the introduction of Coulomb matrices, fragmentations of the matrix into rigid molecular parts, and an additional structure unification using a specialized force field to uniquely handle the stereochemistry. That makes MolBar especially suitable to handle he stereochemistry of inorganics and organometallics.

Discussion

In the beginning of the workshop Djordje Baljozovic, Jan Brammer, Nauman Kahn, and Frank Lage presented the status of the technical developments all around the InChI code. That included the test environments for test batches and in Web browsers.

During the discussion Thomas Dörner came up with the idea to use spherical projections to represent the stereochemistry of inorganics. In the further discussion the introduction of polar coordinates around the centre of the chemical complex has been brought up and is further investigated. 

The organometallics and inorganic working group of the IUPAC subcommittee is consolidating the results of this workshop. It will have to define the syntax for the stereochemistry representation of inorganics within the InChI string as next step.

CDD 23 – Focus on data in chemistry

On the 6th and 7th of June this year, the Chemistry Data Days took place in Mainz for the first time. Speakers examined data in chemistry from different angles. Guillermo Restrepo, for example, looked at the historical development of data spaces from a mathematical, computational and ethical point of view, Paul Czodrowski showed that no AI can work meaningfully without data (because AI still primarily means machine learning) and Kevin Jablonka demonstrated how large language models (such as ChatGPT) will change chemistry.

But the bulk of the congress was dedicated to talks and workshops  concerning data handling principles such as data management and smart lab tools. After all, the conference had a hands-on mission that it had to live up to with its participants. Workshops on starting a career for young chemists and on successful publishing as well as a bar camp rounded off the programme. The evening poster session with buffet and beer tasting offered plenty of space and opportunity for networking and thus marked an unofficial highlight of the Chemistry Data Days.

Participants of the Chemistry Data Days 2023
Participants of the Chemistry Data Days 2023

Almost 90 interested participants followed the talks and workshops over two days in the historic refectory building of the University of Mainz. The presentations by several electronic lab notebook providers, who were able to answer further questions with live demonstrations at exhibition stands, were particularly well received.

The questions submitted by participants and discussed in the bar camp were also met with great interest. Here, a small group of experts had spontaneously come together to discuss deeper questions and problems around metadata, ontologies and machine readability.

The increasing pressure from national and international funders and publishers to also make the raw data available according to the FAIR data principles for funded or published projects is noticeable for all researching chemists. This also explains the interest in learning about data handling, from data management to ELN, metadata and repositories to publishing, in order to be able to use it (better) in their own research.

The format of a conference with expert lectures and hands-on workshops on data (management) has already proven its worth at the first Chemistry Data Days. The planning for the next iteration has begun.

FAIR raw data in the Journal of Natural Products

At the end of April 2023, the American Journal of Natural Products announced in its editorial that it will require authors to make NMR data public and fairly available as raw data starting in June 2023. The journal specifies that the repository must follow FAIR principles. In addition, the journal recommends four repositories where NMR data can be meaningfully stored.

The journal does not specify which repository the data must be stored in. Nor does it make any broader specifications. Thus, it seems to be important to the journal to give authors as much freedom as possible in choosing repositories. However, the fact that at least the raw data on which a published article is based must be made available FAIR is a clear step towards more FAIRness in scientific discourse.

Trend towards FAIR data

Even though the Journal of Natural Products’ decision is not yet part of the mainstream, a tendency can be identified that has already been scientifically studied. There is a clear trend among publishers toward requirements for FAIRer data publication in scientific publishing.

We are particularly pleased that nmrXiv.org, a repository developed and operated by NFDI4Chem together with the University of Jena and NMRium, is recommended.

Display of FAIR raw data- Findable - Accessible - Interoperable - Reusable - NFDI4Chem

Image Attribution: SangyaPundir, CC BY-SA 4.0.

FAIR data mandatory as of August 2023

Did you know? As of 01.08.2023, the publication of FAIR data, i.e. Findable, Accessible, Interoperable, Reusable, is mandatory for scientists at universities and non-university research institutions that (want to) receive funding from the DFG.
This is regulated in the DFG’s “Guidelines for Safeguarding Good Scientific Practice” codex (https://doi.org/10.5281/zenodo.6472827), which already came into force on 01.08.2019 and whose implementation period (after already being extended once) now finally expires on 31.07.2023.
In it, “Guideline 13: Establishing public access to research results” formulates that not only research results are to be included in the scientific discourse, but also, among other things, the underlying research data. The explanations also explicitly refer to the FAIR principles.

So what does this mean?

A data management plan (DMP) has long been an integral part of the DFG’s requirements for research proposals. The findings on FAIR data storage formulated in the Code now go beyond this. Narrowly formulated reasons for exceptions (e.g. for patent applications) leave little room for maneuver. Those who do not comply will have to reckon with the loss of funding as of August.

Chemists are therefore well advised to be prepared for this regulation. However, in order to take into account “subject-specific relevant recommendations on standards, methods and infrastructures “, as it says on the DFG website, you also need to know them – and this is not about chemistry, but data storage. ELN, “recognized” repositories, metadata, terminologies, ontologies – anyone who stumbles here should quickly find out.

Help is at hand

The NFDI4Chem consortium offers regular workshops, both online for individual participation and exclusively on-site at your institute. Our extensive Knowledge Base will help you with all questions regarding chemical research data.
We are also happy to advise you on the workgroup/institute/department-wide use of the open source ELN software Chemotion. Please contact us!

FAIR = Findable, Accessible, Interoperable, Reusable - NFDI4Chem

Image Attribution: SangyaPundir, CC BY-SA 4.0.