Search Results

Now showing 1 - 7 of 7
  • Item
    The Research Core Dataset (KDSF) in the Linked Data context
    (Amsterdam [u.a.] : Elsevier, 2019) Walther, Tatiana; Hauschke, Christian; Kasprzik, Anna; Sicilia, Miguel-Angel; Simons, Ed; Clements, Anna; de Castro, Pablo; Bergström, Johan
    This paper describes our efforts to implement the Research Core Dataset (“Kerndatensatz Forschung”; KDSF) as an ontology in VIVO. KDSF is used in VIVO to record the required metadata on incoming data and to produce reports as an output. While both processes need an elaborate adaptation of the KDSF specification, this paper focusses on the adaptation of the KDSF basic data model for recording data in VIVO. In this context, the VIVO and KDSF ontologies were compared with respect to domain, syntax, structure, and granularity in order to identify correspondences and mismatches. To produce an alignment, different matching approaches have been applied. Furthermore, we made necessary modifications and extensions on KDSF classes and properties.
  • Item
    Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches
    (Amsterdam [u.a.] : Elsevier, 2021) Huber, Robert; D'Onofrio, Claudio; Devaraju, Anusuriya; Klump, Jens; Loescher, Henry W.; Kindermann, Stephan; Guru, Siddeswara; Grant, Mark; Morris, Beryl; Wyborn, Lesley; Evans, Ben; Goldfarb, Doron; Genazzio, Melissa A.; Ren, Xiaoli; Magagna, Barbara; Thiemann, Hannes; Stocker, Markus
    When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward. We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well. Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.
  • Item
    Research Information Infrastructure in Ukraine: First steps towards building a national CRIS
    (Amsterdam [u.a.] : Elsevier, 2022) Kaliuzhna, Nataliia; Auhunas, Sabina
    Development and implementation of Current Research Information Systems (CRIS) is one of the most transparent and practical approaches to curate research information on a national level. The process of building and implementing such systems is a complex and time consuming where successful results heavily depend on the established research information infrastructure of a country, the interoperability of the systems and the quality of the information which reside in them. The purpose of this paper is to analyse the existing Ukrainian Research Information Infrastructure and identify which databases could be reused and integrated with a national Ukrainian Current Research Information System (URIS). The analysis showed that there are functional databases and registries that collect data on research activities and could be used as a data sources for the URIS. In particular, the Unified State Electronic Database on Education is a potential data source on higher educational institutions, the National Repository of Academic Texts - on metadata on research output, internal database of the National Research Foundation of Ukraine and database on research projects maintained by Ukrainian Institute of Scientific Technical and Economic Information - on projects. Secondly, it was identified that Ukrainian research infrastructure lacks complete, up-to-date registry on researchers. Finally, we discussed the challenges and solutions for further steps in building national CRIS.
  • Item
    Development of a Domain-Specific Ontology to Support Research Data Management for the Tailored Forming Technology
    (Amsterdam [u.a.] : Elsevier, 2020) Sheveleva, Tatyana; Koepler, Oliver; Mozgova, Iryna; Lachmayer, Roland; Auer, Sören
    The global trend towards the comprehensive digitisation of technologies in product manufacturing is leading to radical changes in engineering processes and requires a new extended understanding of data handling. The amounts of data to be considered are becoming larger and more complex. Data can originate from process simulations, machines used or subsequent analyses, which together with the resulting components serve as a complete and reproducible description of the process. Within the Collaborative Research Centre "Process Chain for Manufacturing of Hybrid High Performance Components by Tailored Forming", interdisciplinary work is being carried out on the development of process chains for the production of hybrid components. The management of the generated data and descriptive metadata, the support of the process steps and preliminary and subsequent data analysis are fundamental challenges. The objective is a continuous, standardised data management according to the FAIR Data Principles so that process-specific data and parameters can be transferred together with the components or samples to subsequent processes, individual process designs can take place and processes of machine learning can be accelerated. A central element is the collaborative development of a domain-specific ontology for a semantic description of data and processes of the entire process chain.
  • Item
    SemSur: A Core Ontology for the Semantic Representation of Research Findings
    (Amsterdam [u.a.] : Elsevier, 2018) Fathalla, Said; Vahdati, Sahar; Auer, Sören; Lange, Christoph; Fensel, Anna; de Boer, Victor; Pellegrini, Tassilo; Kiesling, Elmar; Haslhofer, Bernhard; Hollink, Laura; Schindler, Alexander
    The way how research is communicated using text publications has not changed much over the past decades. We have the vision that ultimately researchers will work on a common structured knowledge base comprising comprehensive semantic and machine-comprehensible descriptions of their research, thus making research contributions more transparent and comparable. We present the SemSur ontology for semantically capturing the information commonly found in survey and review articles. SemSur is able to represent scientific results and to publish them in a comprehensive knowledge graph, which provides an efficient overview of a research field, and to compare research findings with related works in a structured way, thus saving researchers a significant amount of time and effort. The new release of SemSur covers more domains, defines better alignment with external ontologies and rules for eliciting implicit knowledge. We discuss possible applications and present an evaluation of our approach with the retrospective, exemplary semantification of a survey. We demonstrate the utility of the SemSur ontology to answer queries about the different research contributions covered by the survey. SemSur is currently used and maintained at OpenResearch.org.
  • Item
    Latent Class Cluster Analysis: Selecting the number of clusters
    (Amsterdam [u.a.] : Elsevier, 2022) Lezhnina, Olga; Kismihók, Gábor
    Latent Class Cluster Analysis (LCCA) is an advanced model-based clustering method, which is increasingly used in social, psychological, and educational research. Selecting the number of clusters in LCCA is a challenging task involving inevitable subjectivity of analytical choices. Researchers often rely excessively on fit indices, as model fit is the main selection criterion in model-based clustering; it was shown, however, that a wider spectrum of criteria needs to be taken into account. In this paper, we suggest an extended analytical strategy for selecting the number of clusters in LCCA based on model fit, cluster separation, and stability of partitions. The suggested procedure is illustrated on simulated data and a real world dataset from the International Computer and Information Literacy Study (ICILS) 2018. For the latter, we provide an example of end-to-end LCCA including data preprocessing. The researcher can use our R script to conduct LCCA in a few easily reproducible steps, or implement the strategy with any other software suitable for clustering. We show that the extended strategy, in comparison to fit indices-based strategy, facilitates the selection of more stable and well-separated clusters in the data. • The suggested strategy aids researchers to select the number of clusters in LCCA • It is based on model fit, cluster separation, and stability of partitions • The strategy is useful for finding separable generalizable clusters in the data.
  • Item
    Phenotyping in the era of genomics: MaTrics—a digital character matrix to document mammalian phenotypic traits
    (Amsterdam [u.a.] : Elsevier, 2021) Stefen, Clara; Wagner, Franziska; Asztalos, Marika; Giere, Peter; Grobe, Peter; Hiller, Michael; Hofmann, Rebecca; Jähde, Maria; Lächele, Ulla; Lehmann, Thomas; Ortmann, Sylvia; Peters, Benjamin; Ruf, Irina; Schiffmann, Christian; Thier, Nadja; Unterhitzenberger, Gabriele; Vogt, Lars; Rudolf, Matthias; Wehner, Peggy; Stuckas, Heiko
    A new and uniquely structured matrix of mammalian phenotypes, MaTrics (Mammalian Traits for Comparative Genomics) in a digital form is presented. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics. MaTrics was developed within a project aimed to find genetic causes of phenotypic traits of mammals using Forward Genomics. This approach requires genomes and comprehensive and recorded information on homologous phenotypes that are coded as discrete categories in a matrix. MaTrics is an evolving online resource providing information on phenotypic traits in numeric code; traits are coded either as absent/present or with several states as multistate. The state record for each species is linked to at least one reference (e.g., literature, photographs, histological sections, CT scans, or museum specimens) and so MaTrics contributes to digitalization of museum collections. Currently, MaTrics covers 147 mammalian species and includes 231 characters related to structure, morphology, physiology, ecology, and ethology and available in a machine actionable NEXUS-format*. Filling MaTrics revealed substantial knowledge gaps, highlighting the need for phenotyping efforts. Studies based on selected data from MaTrics and using Forward Genomics identified associations between genes and certain phenotypes ranging from lifestyles (e.g., aquatic) to dietary specializations (e.g., herbivory, carnivory). These findings motivate the expansion of phenotyping in MaTrics by filling research gaps and by adding taxa and traits. Only databases like MaTrics will provide machine actionable information on phenotypic traits, an important limitation to genomics. MaTrics is available within the data repository Morph·D·Base (www.morphdbase.de).