Domain-Independent Extraction of Scientific Concepts from Research Articles

Brack, Arthur; D'Souza, Jennifer; Hoppe, Anett; Auer, Sören; Ewerth, Ralph

doi:https://doi.org/10.34657/5226

Domain-Independent Extraction of Scientific Concepts from Research Articles

dc.bibliographicCitation.bookTitle	Advances in Information Retrieval	eng
dc.bibliographicCitation.journalTitle	Lecture Notes in Computer Science	eng
dc.contributor.author	Brack, Arthur
dc.contributor.author	D'Souza, Jennifer
dc.contributor.author	Hoppe, Anett
dc.contributor.author	Auer, Sören
dc.contributor.author	Ewerth, Ralph
dc.contributor.editor	Jose, Joemon M.
dc.contributor.editor	Yilmaz, Emine
dc.contributor.editor	Magalhães, João
dc.contributor.editor	Castells, Pablo
dc.contributor.editor	Ferro, Nicola
dc.contributor.editor	Silva, Mário J.
dc.contributor.editor	Martins, Flávio
dc.date.accessioned	2021-06-04T08:40:40Z
dc.date.available	2021-06-04T08:40:40Z
dc.date.issued	2020
dc.description.abstract	We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.	eng
dc.description.version	submittedVersion	eng
dc.identifier.uri	https://oa.tib.eu/renate/handle/123456789/6179
dc.identifier.uri	https://doi.org/10.34657/5226
dc.language.iso	eng	eng
dc.publisher	Cham : Springer	eng
dc.relation.doi	https://doi.org/10.1007/978-3-030-45439-5_17
dc.relation.essn	1611-3349
dc.relation.isbn	978-3-030-45438-8
dc.relation.isbn	978-3-030-45439-5
dc.relation.issn	0302-9743
dc.rights.license	Es gilt deutsches Urheberrecht. Das Dokument darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.	eng
dc.subject.ddc	020	eng
dc.subject.gnd	Konferenzschrift	ger
dc.subject.other	Sequence labelling	eng
dc.subject.other	Information extraction	eng
dc.subject.other	Scientific articles	eng
dc.subject.other	Active learning	eng
dc.subject.other	Scholarly communication	eng
dc.subject.other	Research knowledge graph	eng
dc.title	Domain-Independent Extraction of Scientific Concepts from Research Articles	eng
dc.type	BookPart	eng
dcterms.event	European Conference on Information Retrieval, 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020
tib.accessRights	openAccess	eng
wgl.contributor	TIB	eng
wgl.subject	Informatik	eng
wgl.type	Buchkapitel / Sammelwerksbeitrag	eng
wgl.type	Konferenzbeitrag	eng

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Brack2020, Preprint.pdf
Size:: 825.86 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Informationswissenschaften