Search Results

Now showing 1 - 4 of 4
  • Item
    Identifying and correcting invalid citations due to DOI errors in Crossref data
    (Dordrecht [u.a.] : Springer Science + Business Media B.V., 2022) Cioffi, Alessia; Coppini, Sara; Massari, Arcangelo; Moretti, Arianna; Peroni, Silvio; Santini, Cristian; Shahidzadeh Asadi, Nooshin
    This work aims to identify classes of DOI mistakes by analysing the open bibliographic metadata available in Crossref, highlighting which publishers were responsible for such mistakes and how many of these incorrect DOIs could be corrected through automatic processes. By using a list of invalid cited DOIs gathered by OpenCitations while processing the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI) in the past two years, we retrieved the citations in the January 2021 Crossref dump to such invalid DOIs. We processed these citations by keeping track of their validity and the publishers responsible for uploading the related citation data in Crossref. Finally, we identified patterns of factual errors in the invalid DOIs and the regular expressions needed to catch and correct them. The outcomes of this research show that only a few publishers were responsible for and/or affected by the majority of invalid citations. We extended the taxonomy of DOI name errors proposed in past studies and defined more elaborated regular expressions that can clean a higher number of mistakes in invalid DOIs than prior approaches. The data gathered in our study can enable investigating possible reasons for DOI mistakes from a qualitative point of view, helping publishers identify the problems underlying their production of invalid citation data. Also, the DOI cleaning mechanism we present could be integrated into the existing process (e.g. in COCI) to add citations by automatically correcting a wrong DOI. This study was run strictly following Open Science principles, and, as such, our research outcomes are fully reproducible.
  • Item
    The Concept of Identifiability in ML Models
    (Setúbal : SciTePress - Science and Technology Publications, Lda., 2022) von Maltzan, Stephanie; Bastieri, Denis; Wills, Gary; Kacsuk, Péter; Chang, Victor
    Recent research indicates that the machine learning process can be reversed by adversarial attacks. These attacks can be used to derive personal information from the training. The supposedly anonymising machine learning process represents a process of pseudonymisation and is, therefore, subject to technical and organisational measures. Consequently, the unexamined belief in anonymisation as a guarantor for privacy cannot be easily upheld. It is, therefore, crucial to measure privacy through the lens of adversarial attacks and precisely distinguish what is meant by personal data and non-personal data and above all determine whether ML models represent pseudonyms from the training data.
  • Item
    Concept for Setting up an LTA Working Group in the NFDI Section "Common Infrastructures"
    (Zenodo, 2022-04-12) Bach, Felix; Degkwitz, Andreas; Horstmann, Wolfram; Leinen, Peter; Puchta, Michael; Stäcker, Thomas
    NFDI consortia have a variety of disparate and distributed information infrastructures, many of which are as yet only loosely or poorly connected. A major goal is to create a Research Data Commons (RDC) . The RDC concept1 includes, for example, shared cloud services, an application layer with access to high-performance computing (HPC), collaborative workspaces, terminology services, and a common authentication and authorization infrastructure (AAI). The necessary interoperability of services requires, in particular, agreement on protocols and standards, the specification of workflows and interfaces, and the definition of long-term sustainable responsibilities for overarching services and deliverables. Infrastructure components are often well-tested in NFDI on a domain-specific basis, but are quite heterogeneous and diverse between domains. LTA for digital resources has been a recurring problem for well over 30 years and has not been conclusively solved to date, getting urgency with the exponential growth of research data, whether it involves demands from funders - the DFG requires 10 years of retention - or digital artifacts that must be preserved indefinitely as digital cultural heritage. Against this background, the integration of the LTA into the RDC of the NFDI is an urgent desideratum in order to be able to guarantee the permanent usability of research data. A distinction must be2 made between the archiving of the digital objects as bitstreams (this can be numeric or textual data or complex objects such as models), which represents a first step towards long-term usability, and the archiving of the semantic and software-technical context of the digital original objects, which entails far more effort. Beyond the technical embedding of the LTA in the system environment of a multi-cloud-based infrastructure, a number of technically differentiated requirements of the NFDI's subject consortia are part of the development of a basic service for the LTA and for the re-use of research data.3 The need for funding for the development of a basic LTA service for the NFDI consortia results primarily from the additional costs associated with the technical and organizational development of a cross-NFDI, decentralized network structure for LTA and the sustainable subsequent use of research data. It is imperative that the technical actors are able to act within the network as a technology-oriented community, and that they can provide their own services as part of the support for also within a federated infrastructure. The working group "Long Term Archiving" (LTA) is to develop the requirements of the technical consortia for LTA and, on this basis, strategic approaches for the implementation of a basic service LTA. The working group consists of members of various NFDI consortia covering the humanities, natural science and engineering disciplines and experts from a variety of pertinent infrastructures with strong overall connections to the nestor long-term archiving competence network. The close linkage of NFDI consortia with experienced4 partners in the field of LTA ensures that a) the relevant technical state-of-the-art is present in the group and b) the knowledge of data producers about contexts of origin and data users interact directly. This composition enables the team to take an overarching view that spans the requirements of the disciplines and consortia, also takes into account interdisciplinary needs, and at the same time brings in the existing know-how in the infrastructure sector.
  • Item
    Audio Ontologies for Intangible Cultural Heritage
    (Bramhall, Stockport ; EasyChair Ltd., 2022-04-12) Tan, Mary Ann; Posthumus, Etienne; Sack, Harald
    Cultural heritage portals often contain intangible objects digitized as audio files. This paper presents and discusses the adaptation of existing audio ontologies intended for non-cultural heritage applications. The resulting alignment of the German Digital Library-Europeana Data Model (DDB-EDM) with Music Ontology (MO) and Audio Commons Ontology (ACO) is presented.