Search Results

Now showing 1 - 2 of 2
  • Item
    SciBERT-based Semantification of Bioassays in the Open Research Knowledge Graph
    (Aachen : RWTH, 2020) Anteghini, Marco; D'Souza, Jennifer; Martins dos Santos, Vitor A.P.; Auer, Sören
    As a novel contribution to the problem of semantifying bio- logical assays, in this paper, we propose a neural-network-based approach to automatically semantify, thereby structure, unstructured bioassay text descriptions. Experimental evaluations, to this end, show promise as the neural-based semantification significantly outperforms a naive frequencybased baseline approach. Specifically, the neural method attains 72% F1 versus 47% F1 from the frequency-based method. The work in this paper aligns with the present cutting-edge trend of the scholarly knowledge digitalization impetus which aim to convert the long-standing document-based format of scholarly content into knowledge graphs (KG). To this end, our selected data domain of bioassays are a prime candidate for structuring into KGs.
  • Item
    NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature
    (Aachen : RWTH, 2020) D'Souza, Jennifer; Auer, Sören
    We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. Question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the NLPContributions scheme is openly available to the research community at https://doi.org/10.25835/0019761.