NFDI4Chem @ RDA Plenary 20

The research data alliance (RDA) held its 20th plenary during March 21-23 in Gothenburg, Sweden. More than 700 participants enjoyed 53 sessions and several co-located events.

There is plenty of coverage and outcomes on the RDA 20th Plenary Meeting – Gothenburg site. Of particular interest to (and co-organised by!) NFDI4Chem was the session on Describing diverse chemistry datasets across distributed data resources (the recording can be found here).

Felix Bach and Christian Bonatto Minella of Task Area 3 “Repositories” also attended the co-located event ‘The WorldFAIR Project’s Cross-Domain Interoperability Framework’ on March 20, presented by Simon Hodson (Executive Director CODATA) and Arofan Gregory (CODATA). The event aimed to explain the reasoning behind the Cross-Domain Interoperability Framework (CDIF), present its current vision, including activities, functions, and standards, and provide an update on the work’s progress.

Ann-Christin Andres and Daniela Hausen from Task Area 5 participated in the co-located DCC event “Machine-Actionable DMPs”. Here, the focus was not only on contributions from individual operating universities, but also on discussions in parallel sessions. Topics were of a technical, content-related and awareness-raising nature. What content should be mapped in DMPs? With which tools should DMPs exchange data automatically? Which standard should be chosen for this? How can researchers be persuaded to use DMPs? It became apparent that work is being done internationally on similar topics as in the infra-DMP group, so that collaborations are being planned here.

After P20 is before P21, save-the-date: October 23-26, 2023, Salzburg, Austria.

Author Guideline Landscape

As the primary method of communicating research results, journals have an enormous impact on data sharing practices in the scientific community. Ideally, journals and their publishers place recommendations or even requirements into their author guidelines and data policies.

As part of the activities around our Editors4Chem workshop series, Nicole Parks, Tillmann Fischer, and Claudia Blankenburg waded twice through the author guidelines and data policies of 42 journals in chemistry, released by 13 publishers. The goal was to check the requirements and recommendations in several categories regarding research data publication, open science practices, and machine-readable chemistry. 

Details unspecified

While the majority of journals recommend publishing data in a research data repository, the how and where is not always specified. The majority of journals require authors to deposit crystallographic data into the Cambridge Structure Database (CSD), but there are no recommendations for other data types such as NMR. Machine-readable chemical structures (and we are not talking about drawings) are not mentioned in all but one of the guidelines. Recommendations on minimum information standards and metadata were rarely encountered. Nonetheless, the majority of journals recommend or even require a data availability statement and also provide pointers to authors on the concepts of generic and field-specific research data repositories. 

Results

The survey shows that publishers and journals are starting to include aspects of research data sharing in their guidelines. Authors should accept and embrace the guidelines with increasing requirements for data availability, data interoperability, and re-usability to improve chemistry research, and not interpret these guidelines as a “minimum requirement to pass the bar”. Instead, these aspects are indicators of good scientific practice.

diagramme of development of author guidelines from journals - nfdi4chem
As the charts show, the share of “not named” is decreasing in some areas of the authors” guidelines.

The study

The full study is published in Pure and Applied Chemistry (DOI: 10.1515/pac-2022-1001), and, of course, we’re putting our money where our mouth is: the data behind the survey and figures is available on RADAR4Chem (DOI: 10.22000/702). To help researchers to enact the recommendations, we have a growing number of guidance articles in the NFDI4Chem Knowledgebase

Hacking4Chem

In December the first BioHackathon Germany was organised by the team of the German ELIXIR node. Leyla Jael Garcia-Castro (ZBMed and NFDI4Microbiota) and Steffen Neumann (IPB Halle and NFDI4Chem) used the opportunity and successfully submitted a proposal on (Bio)Schemas4NFDI.

Bioschemas is a community effort to improve FAIRness of resources in the Life sciences by defining specific metadata schemas as JSON+LD and exposing that metadata from participating resources. The name Bioschemas is a bit misleading, since from the early days the metadata schemas also covered concepts from chemistry or even information about training material and events.

Aim of the Hacking-workshop was to bring together metadata experts from Bioschemas and several NFDI consortia to adopt and adapt Bioschemas to NFDI use cases. We had participants from, e.g., NFDI4DataScience, GHGA, NFDI4Microbiota, and seven participants from NFDI4Chem.

Actual Hacking was divided into the Data Provider Department, where we were improving the number of supporting ressources, and the coverage of schema metadata provided. In the New & Improved Schemas Department we worked on improving the analytical DataSet, and the new chemical reaction type. The Querying / Harvesting / Exploitation Department had participants from IPB and TIB, and were actually working to make all that metadata useful for you. We were successful in fetching data from BioSchema providers (MassBank) and gathering them into the NFDI4Chem Search service. The presentation by Denitsa Eckweiler (TU Braunschweig) showed how PubPharm currently collects information useful in pharmaceutical chemistry, and we discussed how metadata on chemical data could be included in the future.

Finally, you are currently reading the first output from the Dissemination & Outreach Department, with more to come. We will contribute our experience to the NFDI Section Metadata, and if things work out, you’ll see more in 2023 at NFDI’s own metadata hackathon or the NFDI conference in September in Karlsruhe.

Tweets by de.NBI including some pictures can be found here.

FAIR4Chem Award 2022: And the winner is … Chemistry!

We are currently seeing (and supporting !) the cultural change towards modern, digital research data management (RDM) in chemistry. An important set of criteria for good RDM are the FAIR guiding principles, aiming that research data should be Findable, Accessible, Interoperable, and Reusable. The FAIR4Chem Award 2022 celebrates published chemistry research datasets that best meet the FAIR principles and thus make a significant contribution to increasing transparency in research and the reuse of scientific knowledge. 

The contest was open for submission from 15.10. to 15.12.2021. The winners were selected in a two-stage process, using the criteria of the FAIR assessment tool developed by the Australian Research Data Commons infrastructure followed by a joint jury evaluation of the top-scoring submissions. 

The FAIR4Chem Award 2022 honors the datasets of:

Niels Krausch and Robert T. Giessmann: Collection of UV/Vis spectra acquired while monitoring reaction progress of thymidine phosphorolysis with varying reactant concentrations (DOI: 10.5281/zenodo.3243352)

and

Christopher Kessler et al.: Supplementary material for ‘Adsorption of Light Gases in Covalent Organic Frameworks: Comparison of Classical Density Functional Theory and Grand Canonical Monte Carlo Simulations’ (DOI: 10.18419/darus-1775)

Congratulations to the winners! The award included a prize money of 500€, kindly provided by the Fonds der Chemischen Industrie (FCI). It was presented during our NFDI4Chem session at the JCF Frühjahrssymposium on March 25th, 2022, in Hannover.

FAIR4Chem-Award ceremony at the JCF 2022 in Hannover
Award ceremony at the JCF 2022 in Hannover. Oliver Koepler, Niels Krausch, Christopher Keßler, Johannes Liermann (left-to-right). Foto: Stephan Siroky.

And remember: after the contest is before the contest! We will keep you posted on next years’ FAIR4Chem contest.

Get to Know the Consortium: TA 4!

Overview of the Task Areas (TA):

  • TA1 – Management: Christoph Steinbeck (Friedrich Schiller University Jena)
  • TA2 – Smart Lab: Nicole Jung (Karlsruhe Institute for Technology)
  • TA3 – Repository: Felix Bach (Karlsruhe Institute for Technology) & Matthias Razum (FIZ Karlsruhe – Leibniz Institute for Information Infrastructure)
  • TA4 – Metadata, Data Standards and Publication Standards: Steffen Neumann (Leibniz Institute for Plant Biochemistry) & Christoph Steinbeck (Friedrich Schiller University Jena)
  • TA5 – Community Involvement and Training: Sonja Herres-Pawlis (RWTH Aachen University) & Johannes Liermann (Johannes Gutenberg University Mainz)
  • TA6 – Synergies: Oliver Koepler (Leibniz Information Center for Science and Technology TIB Hannover)

Steffen Neumann
Lead of TA 4

Leibniz Institute for Plant Biochemistry

I am Steffen Neumann – head of the Research Group Bioinformatics and Scientific Data at the Leibniz Institute of Plant Biochemistry in Halle (Saale). I have prior experience in the area of statistical mass spectrometry data analysis and metabolite identification. In this context, I continuously advocate open data and open standards, leading to community-wide e-infrastructures, and exploiting these for functional annotation through computational metabolomics analyses. I am a member of the scientific advisory board of the French metabolomics infrastructure MetaboHUB, and associate editor for BMC Bioinformatics, MDPI Metabolites, and Nature Scientific Data

Here in NFDI4Chem, I am co-lead with Chris Steinbeck in task area 4 on Metadata, Data Standards and Publication Standards. This is a truly vast and long term undertaking, and previous experience has shown that continuous persistence and perseverance are required to ensure progress.

I particularly enjoy coordinating the Lead-by-example efforts. Here, we work with the community (you!) to collect research data associated with recent or even upcoming manuscripts to gather, organise, and publish your data sets for and with you. Upcoming manuscripts will then have the benefit of being able to include a “Data availability statement” which points to citable data sets in a suitable resource, such as the Chemotion Repository or RADAR, depending on data types and content.

MassBank on Ice

Did you know that MassBank data is now literally stored on ice for a thousand years?

At the beginning of February last year (precisely on 02.02.2020!) GitHub took a snapshot of all project repositories, put everything as QR code on durable film and deposited them into an old coal mine in Svalbard, Norway. Never heard of that place? It is also known as Spitzbergen, it was declared a demilitarised zone a hundred years ago, has permafrost, hosts the global seed vault and now also 21 terabytes of open source and open data. For a brief video (2:27 min) of that endeavour, head over to YouTube


MassBank EU, hosted at UFZ (Leipzig), is a public repository of mass spectral data for the scientific community. MassBank data are useful for the annotation of chemical compounds detected by mass spectrometry. The MassBank records are managed in the version control system called “git” on GitHub, with all spectral data and the corresponding meta data in a human-readable record format. Such version control systems have been used by programmers to organise the software source code already for decades, and increasingly they are also adopted for research data management. The collaboration between GitHub and Zenodo allows to archive MassBank releases on Zenodo with a Digital Object Identifier, e.g., . In the scope of the NFDI4Chem, we are working on simplifying the data submission and a better integration with the other NFDI4Chem services. 

P.S.: Other developments in NFDI4Chem which are available from GitHub, such as the Chemotion ELN, are of course preserved on ice as well!

Chemistry ? NFDI4Chem ! poster at HeFDI 2020

We will present an NFDI4Chem Poster at this year’s HeFDI Plenary 2020 on 17/12/2020. Virtually, of course. But that opens the opportunity to provide the Poster also in this post: NFDI4Chem@HeFDI2020 (click to advance frames). The video is also available on YouTube, including the MakingOf in the video description. Enjoy, and in case of questions, contact us on the website or by eMail.

Abstract:

More and more digital research data are generated. So, new concepts are essential: In which data formats can data be stored in the long term? How and where can data be stored? Which information of the experiment / the simulation should be noted in the metadata? How can these data be made accessible for group members and other researchers? How can these data be made findable for researchers and KI algorithms?

In order to implement the FAIR data principles (findable, accessible, interoperable and re-usable), the DFG funds a national research data management infrastructure with members of university and non-university research, infrastructure facilities, learned societies as well as publishing houses and industry. This long-term project is supported with up to 85 million euros per year from 2020 for a period of ten years.

In chemistry, NFDI4Chem tackles the challenges of research data management nationally but also internationally (e.g. in the context of IUPAC, EOSC und RDA).[2,3] NFDI4Chem is a consortium consisting of representatives from universities and non-university research institutions, infrastructure facilities and the German Chemical Society, the Bunsen Society and the German Pharmaceutical Society. The focus of the NFDI4Chem consortium is on the molecule itself, its properties and reactions. The project started on 1st of October 2020.[4]

References:

[1] M. Wilkinson, M. Dumontier, I. Aalbersberg, I. et al. Sci Data 2016, 3, 160018.

[2] S. Herres-Pawlis, O. Koepler, C. Steinbeck, Angew. Chem. Int. Ed. 2019, 58, 10766-10768

[3] Steinbeck C, Koepler O, Bach F, Herres-Pawlis S, Jung N, Liermann JC, Neumann S, Razum M, Baldauf C, Biedermann F, Bocklitz TW, Boehm F, Broda F, Czodrowski P, Engel T, Hicks MG, Kast SM, Kettner C, Koch W, Lanza G, Link A, Mata RA, Nagel WE, Porzel A, Schlörer N, Schulze T, Weinig H-G, Wenzel W, Wessjohann LA, Wulle S, Research Ideas and Outcomes, 2020, 6: e55852.

Data pledge: Let’s lead-by-example !

NFDI4Chem has a wonderful vision of how we want to collect, manage and share research data in the end. But all of us have publications and corresponding research data already today. So one of the things we do right now is to collect these data sets from the consortium and the community.

NFDI4Chem has the expertise and will provide support to prepare and submit data sets with the tools and repositories that exist today. We ask you to pledge your data-sets from previous, current or even upcoming publications through our survey form, and we will be in touch to support the process. Based on this process we will create documentation and training material for the future, and give impulses for the other activities in NFDI4Chem.