Search Results

Now showing 1 - 9 of 9
  • Item
    Crowdsourcing Scholarly Discourse Annotations
    (New York, NY : ACM, 2021) Oelen, Allard; Stocker, Markus; Auer, Sören
    The number of scholarly publications grows steadily every year and it becomes harder to find, assess and compare scholarly knowledge effectively. Scholarly knowledge graphs have the potential to address these challenges. However, creating such graphs remains a complex task. We propose a method to crowdsource structured scholarly knowledge from paper authors with a web-based user interface supported by artificial intelligence. The interface enables authors to select key sentences for annotation. It integrates multiple machine learning algorithms to assist authors during the annotation, including class recommendation and key sentence highlighting. We envision that the interface is integrated in paper submission processes for which we define three main task requirements: The task has to be . We evaluated the interface with a user study in which participants were assigned the task to annotate one of their own articles. With the resulting data, we determined whether the participants were successfully able to perform the task. Furthermore, we evaluated the interface’s usability and the participant’s attitude towards the interface with a survey. The results suggest that sentence annotation is a feasible task for researchers and that they do not object to annotate their articles during the submission process.
  • Item
    The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources
    (Paris : European Language Resources Association, 2020) D'Souza, Jennifer; Hoppe, Anett; Brack, Arthur; Jaradeh, Mohamad Yaser; Auer, Sören; Ewerth, Ralph
    We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM disciplines that were found to be the most prolific ones on a major publishing platform. We describe the creation of such a multidisciplinary corpus and highlight the obtained findings in terms of the following features: 1) a generic conceptual formalism for scientific entities in a multidisciplinary scientific context; 2) the feasibility of the domain-independent human annotation of scientific entities under such a generic formalism; 3) a performance benchmark obtainable for automatic extraction of multidisciplinary scientific entities using BERT-based neural models; 4) a delineated 3-step entity resolution procedure for human annotation of the scientific entities via encyclopedic entity linking and lexicographic word sense disambiguation; and 5) human evaluations of Babelfy returned encyclopedic links and lexicographic senses for our entities. Our findings cumulatively indicate that human annotation and automatic learning of multidisciplinary scientific concepts as well as their semantic disambiguation in a wide-ranging setting as STEM is reasonable.
  • Item
    Semantic Representation of Physics Research Data
    (Setúbal, Portugal : Science and Technology Publications, Lda, 2020) Say, Aysegul; Fathalla, Said; Vahdati, Sahar; Lehmann, Jens; Auer, Sören; Aveiro, David; Dietz, Jan; Filipe, Joaquim
    Improvements in web technologies and artificial intelligence enable novel, more data-driven research practices for scientists. However, scientific knowledge generated from data-intensive research practices is disseminated with unstructured formats, thus hindering the scholarly communication in various respects. The traditional document-based representation of scholarly information hampers the reusability of research contributions. To address this concern, we developed the Physics Ontology (PhySci) to represent physics-related scholarly data in a machine-interpretable format. PhySci facilitates knowledge exploration, comparison, and organization of such data by representing it as knowledge graphs. It establishes a unique conceptualization to increase the visibility and accessibility to the digital content of physics publications. We present the iterative design principles by outlining a methodology for its development and applying three different evaluation approaches: data-driven and criteria-based evaluation, as well as ontology testing.
  • Item
    Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions - A Trial Dataset
    (Beijing : National Science Library, Chinese Academy of Sciences, 2021) D’Souza, Jennifer; Auer, Sören
    This work aims to normalize the NlpContributions scheme (henceforward, NlpContributionGraph) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. The application of NlpContributionGraph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. NlpContributionGraph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. We demonstrate NlpContributionGraph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. NlpContributionGraph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph, which to the best of our knowledge does not exist in the community. Furthermore, our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.
  • Item
    Experience: Open fiscal datasets, common issues, and recommendations
    (Zenodo, 2018) Musyaffa, Fathoni A.; Engels, Christiane; Vidal, Maria-Esther; Orlandi, Fabrizio; Auer, Sören
    A pre-print paper detailing recommendation for publishing fiscal data, including assessment framework for fiscal datasets. This paper has been accepted at ACM Journal of Data and Information Quality (JDIQ) in 2018.
  • Item
    ORKG: Facilitating the Transfer of Research Results with the Open Research Knowledge Graph
    (Sofia : Pensoft, 2021) Auer, Sören; Stocker, Markus; Vogt, Lars; Fraumann, Grischa; Garatzogianni, Alexandra
    This document is an edited version of the original funding proposal entitled 'ORKG: Facilitating the Transfer of Research Results with the Open Research Knowledge Graph' that was submitted to the European Research Council (ERC) Proof of Concept (PoC) Grant in September 2020 (https://erc.europa.eu/funding/proof-concept). The proposal was evaluated by five reviewers and has been placed after the evaluations on the reserve list. The main document of the original proposal did not contain an abstract.
  • Item
    OpenBudgets.eu: A platform for semantically representing and analyzing open fiscal data
    (Zenodo, 2018) Musyaffa, Fathoni A.; Halilaj, Lavdim; Li, Yakun; Orlandi, Fabrizio; Jabeen, Hajira; Auer, Sören; Vidal, Maria-Esther
    A paper describing the details of OpenBudgets.eu platform implementation. Pre-print version of the paper accepted at International Conference On Web Engineering (ICWE) 2018 in Caceres, Spain.
  • Item
    Towards an Open Research Knowledge Graph
    (Zenodo, 2018) Auer, Sören; Blümel, Ina; Ewerth, Ralph; Garatzogianni, Alexandra; Heller,, Lambert; Hoppe, Anett; Kasprzik, Anna; Koepler, Oliver; Nejdl, Wolfgang; Plank, Margret; Sens, Irina; Stocker, Markus; Tullney, Marco; Vidal, Maria-Esther; van Wezenbeek, Wilma
    The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Despite an improved and digital access to scientific publications in the last decades, the exchange of scholarly knowledge continues to be primarily document-based: Researchers produce essays and articles that are made available in online and offline publication media as roughly granular text documents. With current developments in areas such as knowledge representation, semantic search, human-machine interaction, natural language processing, and artificial intelligence, it is possible to completely rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the distributed, decentralized, collaborative creation and evolution of information models, vocabularies, ontologies, and knowledge graphs for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This revolutionizes scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. As a result, scientific work becomes more effective and efficient, since results become directly comparable and easier to reuse. In order to realize the vision of knowledge-based information flows in scholarly communication, comprehensive long-term technological infrastructure development and accompanying research are required. To secure information sovereignty, it is also of paramount importance to science – and urgency to science policymakers – that scientific infrastructures establish an open counterweight to emerging commercial developments in this area. The aim of this position paper is to facilitate the discussion on requirements, design decisions and a minimum viable product for an Open Research Knowledge Graph infrastructure. TIB aims to start developing this infrastructure in an open collaboration with interested partner organizations and individuals.
  • Item
    NFDI4Ing - the National Research Data Infrastructure for Engineering Sciences
    (Meyrin : CERN, 2020-09-25) Schmitt, Robert H.; Anthofer, Verena; Auer, Sören; Başkaya, Sait; Bischof, Christian; Bronger, Torsten; Claus, Florian; Cordes, Florian; Demandt, Évariste; Eifert, Thomas; Flemisch, Bernd; Fuchs, Matthias; Fuhrmans, Marc; Gerike, Regine; Gerstner, Eva-Maria; Hanke, Vanessa; Heine, Ina; Huebser, Louis; Iglezakis, Dorothea; Jagusch, Gerald; Klinger, Axel; Krafczyk, Manfred; Kraft, Angelina; Kuckertz, Patrick; Küsters, Ulrike; Lachmayer, Roland; Langenbach, Christian; Mozgova, Iryna; Müller, Matthias S.; Nestler, Britta; Pelz, Peter; Politze, Marius; Preuß, Nils; Przybylski-Freund, Marie-Dominique; Rißler-Pipka, Nanette; Robinius, Martin; Schachtner, Joachim; Schlenz, Hartmut; Schwarz, Annett; Schwibs, Jürgen; Selzer, Michael; Sens, Irina; Stäcker, Thomas; Stemmer, Christian; Stille, Wolfgang; Stolten, Detlef; Stotzka, Rainer; Streit, Achim; Strötgen, Robert; Wang, Wei Min
    NFDI4Ing brings together the engineering communities and fosters the management of engineering research data. The consortium represents engineers from all walks of the profession. It offers a unique method-oriented and user-centred approach in order to make engineering research data FAIR – findable, accessible, interoperable, and re-usable. NFDI4Ing has been founded in 2017. The consortium has actively engaged engineers across all five engineering research areas of the DFG classification. Leading figures have teamed up with experienced infrastructure providers. As one important step, NFDI4Ing has taken on the task of structuring the wealth of concrete needs in research data management. A broad consensus on typical methods and workflows in engineering research has been established: The archetypes. So far, seven archetypes are harmonising the methodological needs: Alex: bespoke experiments with high variability of setups, Betty: engineering research software, Caden: provenance tracking of physical samples & data samples, Doris: high performance measurement & computation, Ellen: extensive and heterogeneous data requirements, Frank: many participants & simultaneous devices, Golo: field data & distributed systems. A survey of the entire engineering research landscape in Germany confirms that the concept of engineering archetypes has been very well received. 95% of the research groups identify themselves with at least one of the NFDI4Ing archetypes. NFDI4Ing plans to further coordinate its engagement along the gateways provided by the DFG classification of engineering research areas. Consequently, NFDI4Ing will support five community clusters. In addition, an overarching task area will provide seven base services to be accessed by both the community clusters and the archetype task areas. Base services address quality assurance & metrics, research software development, terminologies & metadata, repositories & storage, data security & sovereignty, training, and data & knowledge discovery. With the archetype approach, NFDI4Ing’s work programme is modular and distinctly method-oriented. With the community clusters and base services, NFDI4Ing’s work programme remains firmly user-centred and highly integrated. NFDI4Ing has set in place an internal organisational structure that ensures viability, operational efficiency, and openness to new partners during the course of the consortium’s development. NFDI4Ing’s management team brings in the experience from two applicant institutions and from two years of actively engaging with the engineering communities. Eleven applicant institutions and over fifty participants have committed to carrying out NFDI4Ing’s work programme. Moreover, NFDI4Ing’s connectedness with consortia from nearby disciplinary fields is strong. Collaboration on cross-cutting topics is well prepared and foreseen. As a result, NFDI4Ing is ready to join the National Research Data Infrastructure.