Search Results

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Item

TinyGenius: Intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

2022, Oelen, Allard, Stocker, Markus, Auer, Sören, Aizawa, Akiko

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

Loading...
Thumbnail Image
Item

Easy Semantification of Bioassays

2022, Anteghini, Marco, D’Souza, Jennifer, dos Santos, Vitor A. P. Martins, Auer, Sören

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

Loading...
Thumbnail Image
Item

Clustering Semantic Predicates in the Open Research Knowledge Graph

2022, Arab Oghli, Omar, D’Souza, Jennifer, Auer, Sören

When semantically describing knowledge graphs (KGs), users have to make a critical choice of a vocabulary (i.e. predicates and resources). The success of KG building is determined by the convergence of shared vocabularies so that meaning can be established. The typical lifecycle for a new KG construction can be defined as follows: nascent phases of graph construction experience terminology divergence, while later phases of graph construction experience terminology convergence and reuse. In this paper, we describe our approach tailoring two AI-based clustering algorithms for recommending predicates (in RDF statements) about resources in the Open Research Knowledge Graph (ORKG) https://orkg.org/. Such a service to recommend existing predicates to semantify new incoming data of scholarly publications is of paramount importance for fostering terminology convergence in the ORKG. Our experiments show very promising results: a high precision with relatively high recall in linear runtime performance. Furthermore, this work offers novel insights into the predicate groups that automatically accrue loosely as generic semantification patterns for semantification of scholarly knowledge spanning 44 research fields.