TinyGenius: Intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation

Loading...
Thumbnail Image
Date
2022
Volume
Issue
Journal
Series Titel
Book Title
Publisher
New York,NY,United States : Association for Computing Machinery
Abstract

As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.

Description
Keywords
Crowdsourcing Microtasks, Intelligent User Interfaces, Knowledge Graph Validation, Scholarly Knowledge Graphs
Citation
Oelen, A., Stocker, M., & Auer, S. (2022). TinyGenius: Intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation (A. Aizawa, ed.). New York,NY,United States : Association for Computing Machinery. https://doi.org//10.1145/3529372.3533285
Collections
License
This document may be downloaded, read, stored and printed for your own use within the limits of § 53 UrhG but it may not be distributed on other websites via the internet or passed on to external parties.
Dieses Dokument darf im Rahmen von § 53 UrhG zum eigenen Gebrauch kostenfrei heruntergeladen, gelesen, gespeichert und ausgedruckt, aber nicht auf anderen Webseiten im Internet bereitgestellt oder an Außenstehende weitergegeben werden.