Search Results

Now showing 1 - 3 of 3
  • Item
    Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches
    (Amsterdam [u.a.] : Elsevier, 2021) Huber, Robert; D'Onofrio, Claudio; Devaraju, Anusuriya; Klump, Jens; Loescher, Henry W.; Kindermann, Stephan; Guru, Siddeswara; Grant, Mark; Morris, Beryl; Wyborn, Lesley; Evans, Ben; Goldfarb, Doron; Genazzio, Melissa A.; Ren, Xiaoli; Magagna, Barbara; Thiemann, Hannes; Stocker, Markus
    When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward. We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well. Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.
  • Item
    AtMoDat: Improving the reusability of ATmospheric MOdel DATa with DataCite DOIs paving the path towards FAIR data
    (München : European Geosciences Union, 2020) Neumann, Daniel; Ganske, Anette; Voss, Vivien; Kraft, Angelina; Höck, Heinke; Peters, Karsten; Quaas, Johannes; Schluenzen, Heinke; Thiemann, Hannes
    The generation of high quality research data is expensive. The FAIR principles were established to foster the reuse of such data for the benefit of the scientific community and beyond. Publishing research data with metadata and DataCite DOIs in public repositories makes them findable and accessible (FA of FAIR). However, DOIs and basic metadata do not guarantee the data are actually reusable without discipline-specific knowledge: if data are saved in proprietary or undocumented file formats, if detailed discipline-specific metadata are missing and if quality information on the data and metadata are not provided. In this contribution, we present ongoing work in the AtMoDat project, -a consortium of atmospheric scientists and infrastructure providers, which aims on improving the reusability of atmospheric model data. Consistent standards are necessary to simplify the reuse of research data. Although standardization of file structure and metadata is well established for some subdomains of the earth system modeling community – e.g. CMIP –, several other subdomains are lacking such standardization. Hence, scientists from the Universities of Hamburg and Leipzig and infrastructure operators cooperate in the AtMoDat project in order to advance standardization for model output files in specific subdomains of the atmospheric modeling community. Starting from the demanding CMIP6 standard, the aim is to establish an easy-to-use standard that is at least compliant with the Climate and Forecast (CF) conventions. In parallel, an existing netCDF file convention checker is extended to check for the new standards. This enhanced checker is designed to support the creation of compliant files and thus lower the hurdle for data producers to comply with the new standard. The transfer of this approach to further sub-disciplines of the earth system modeling community will be supported by a best-practice guide and other documentation. A showcase of a standard for the urban atmospheric modeling community will be presented in this session. The standard is based on CF Conventions and adapts several global attributes and controlled vocabularies from the well-established CMIP6 standard. Additionally, the AtMoDat project aims on introducing a generic quality indicator into the DataCite metadata schema to foster further reuse of data. This quality indicator should require a discipline-specific implementation of a quality standard linked to the indicator. We will present the concept of the generic quality indicator in general and in the context of urban atmospheric modeling data.
  • Item
    ATMODAT Standard v3.0
    (Hamburg : DKRZ, 2020) Gasnke, Anette; Kraft, Angelina; Kaiser, Amandine; Heydebreck, Daniel; Lammert, Andrea; Höck, Heinke; Thiemann, Hannes; Voss, Vivien; Grawe, David; Leitl, Bernd; Schlünzen, K. Heinke; Kretzschmar, Jan; Quaas, Johannes
    Within the AtMoDat project (Atmospheric Model Data), a standard has been developed which is meant for improving the FAIRness of atmospheric model data published in repositories. The ATMODAT standard includes concrete recommendations related to the maturity, publication and enhanced FAIRness of atmospheric model data. The suggestions include requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine readable landing pages are a core element of this standard, and should hold and present discipline-specific metadata on simulation and variable level. This standard is an updated and translated version of "Bericht über initialen Kernstandard und Kurationskriterien des AtMoDat Projektes (v2.4)