Search Results

Now showing 1 - 10 of 43
  • Item
    Knowledge Graphs - Working Group Charter (NFDI section-metadata) (1.2)
    (Genève : CERN, 2023) Stocker, Markus; Rossenova, Lozana; Shigapov, Renat; Betancort, Noemi; Dietze, Stefan; Murphy, Bridget; Bölling, Christian; Schubotz, Moritz; Koepler, Oliver
    Knowledge Graphs are a key technology for implementing the FAIR principles in data infrastructures by ensuring interoperability for both humans and machines. The Working Group "Knowledge Graphs" in Section "(Meta)data, Terminologies, Provenance" of the German National Research Data Infrastructure (Nationale Forschungsdateninfrastruktur (NFDI) e.V.) aims to promote the use of knowledge graphs in all NFDI consortia, to facilitate cross-domain data interlinking and federation following the FAIR principles, and to contribute to the joint development of tools and technologies that enable transformation of structured and unstructured data into semantically reusable knowledge across different domains.
  • Item
    Collaborative annotation and semantic enrichment of 3D media
    (New York,NY,United States : Association for Computing Machinery, 2022) Rossenova, Lozana; Schubert, Zoe; Vock, Richard; Sohmen, Lucia; Günther, Lukas; Duchesne, Paul; Blümel, Ina; Aizawa, Akiko
    A new FOSS (free and open source software) toolchain and associated workflow is being developed in the context of NFDI4Culture, a German consortium of research- and cultural heritage institutions working towards a shared infrastructure for research data that meets the needs of 21st century data creators, maintainers and end users across the broad spectrum of the digital libraries and archives field, and the digital humanities. This short paper and demo present how the integrated toolchain connects: 1) OpenRefine - for data reconciliation and batch upload; 2) Wikibase - for linked open data (LOD) storage; and 3) Kompakkt - for rendering and annotating 3D models. The presentation is aimed at librarians, digital curators and data managers interested in learning how to manage research datasets containing 3D media, and how to make them available within an open data environment with 3D-rendering and collaborative annotation features.
  • Item
    Context-Based Entity Matching for Big Data
    (Cham : Springer, 2020) Tasnim, Mayesha; Collarana, Diego; Graux, Damien; Vidal, Maria-Esther; Janev, Valentina; Graux, Damien; Jabeen, Hajira; Sallinger, Emanuel
    In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.
  • Item
    Building Scholarly Knowledge Bases with Crowdsourcing and Text Mining
    (Aachen : RWTH, 2020) Stocker, Markus; Zhang, Chengzhi; Mayr, Philipp; Lu, Wei; Zhang, Yi
    For centuries, scholarly knowledge has been buried in documents. While articles are great to convey the story of scientific work to peers, they make it hard for machines to process scholarly knowledge. The recent proliferation of the scholarly literature and the increasing inability of researchers to digest, reproduce, reuse its content are constant reminders that we urgently need a transformative digitalization of the scholarly literature. Building on the Open Research Knowledge Graph (http://orkg.org) as a concrete research infrastructure, in this talk we present how using crowdsourcing and text mining humans and machines can collaboratively build scholarly knowledge bases, i.e. systems that acquire, curate and publish data, information and knowledge published in the scholarly literature in structured and semantic form. We discuss some key challenges that human and technical infrastructures face as well as the possibilities scholarly knowledge bases enable.
  • Item
    NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature
    (Aachen : RWTH, 2020) D'Souza, Jennifer; Auer, Sören
    We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. Question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development. Our pilot annotated dataset of 50 NLP-ML scholarly articles according to the NLPContributions scheme is openly available to the research community at https://doi.org/10.25835/0019761.
  • Item
    Check square at CheckThat! 2020: Claim Detection in Social Media via Fusion of Transformer and Syntactic Features
    (Aachen, Germany : RWTH Aachen, 2020) Cheema, Gullasl S.; Hakimov, Sherzod; Ewerth, Ralph; Cappellato, Linda; Eickhoff, Carsten; Ferro, Nicola; Névéol, Aurélie
    In this digital age of news consumption, a news reader has the ability to react, express and share opinions with others in a highly interactive and fast manner. As a consequence, fake news has made its way into our daily life because of very limited capacity to verify news on the Internet by large companies as well as individuals. In this paper, we focus on solving two problems which are part of the fact-checking ecosystem that can help to automate fact-checking of claims in an ever increasing stream of content on social media. For the first prob-lem, claim check-worthiness prediction, we explore the fusion of syntac-tic features and deep transformer Bidirectional Encoder Representations from Transformers (BERT) embeddings, to classify check-worthiness of a tweet, i.e. whether it includes a claim or not. We conduct a detailed feature analysis and present our best performing models for English and Arabic tweets. For the second problem, claim retrieval, we explore the pre-trained embeddings from a Siamese network transformer model (sentence-transformers) specifically trained for semantic textual similar-ity, and perform KD-search to retrieve verified claims with respect to a query tweet.
  • Item
    Combining Textual Features for the Detection of Hateful and Offensive Language
    (Aachen, Germany : RWTH Aachen, 2021) Hakimov, Sherzod; Ewerth, Ralph; Mehta, Parth; Mandl, Thomas; Majumder, Prasenjit; Mitra, Mandar
    The detection of offensive, hateful and profane language has become a critical challenge since many users in social networks are exposed to cyberbullying activities on a daily basis. In this paper, we present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture. The proposed architecture is evaluated on the English Subtask 1A: Identifying Hate, offensive and profane content from the post datasets of HASOC-2021 dataset under the team name TIB-VA. We compared different variants of the contextual word embeddings combined with the character level embeddings and the encoding of collected hate terms.
  • Item
    Optimizing Federated Queries Based on the Physical Design of a Data Lake
    (Aachen : RWTH, 2020) Rohde, Philipp D.; Vidal, Maria-Esther
    The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.
  • Item
    Meetings and Mood-Related or Not? Insights from Student Software Projects
    (New York : Association for Computing Machinery, 2022) Klünder, Jil; Karras, Oliver; Madeiral, Fernanda; Lassenius, Casper
    [Background:] Teamwork, coordination, and communication are a prerequisite for the timely completion of a software project. Meetings as a facilitator for coordination and communication are an established medium for information exchange. Analyses of meetings in software projects have shown that certain interactions in these meetings, such as proactive statements followed by supportive ones, influence the mood and motivation of a team, which in turn affects its productivity. So far, however, research has focused only on certain interactions at a detailed level, requiring a complex and fine-grained analysis of a meeting itself. [Aim:] In this paper, we investigate meetings from a more abstract perspective, focusing on the polarity of the statements, i.e., whether they appear to be positive, negative, or neutral. [Method:] We analyze the relationship between the polarity of statements in meetings and different social aspects, including conflicts as well as the mood before and after a meeting. [Results:] Our results emerge from 21 student software project meetings and show some interesting insights: (1) Positive mood before a meeting is both related to the amount of positive statements in the beginning, as well as throughout the whole meeting, (2) negative mood before the meeting only influences the amount of negative statements in the first quarter of the meeting, but not the whole meeting, and (3) the amount of positive and negative statements during the meeting has no influence on the mood afterwards. [Conclusions:] We conclude that the behaviour in meetings might rather influence short-term emotional states (feelings) than long-term emotional states (mood), which are more important for the project.
  • Item
    On the Impact of Features and Classifiers for Measuring Knowledge Gain during Web Search - A Case Study
    (Aachen, Germany : RWTH Aachen, 2021) Gritz, Wolfgang; Hoppe, Anett; Ewerth, Ralph; Cong, Gao; Ramanath, Maya
    Search engines are normally not designed to support human learning intents and processes. The ÿeld of Search as Learning (SAL) aims to investigate the characteristics of a successful Web search with a learning purpose. In this paper, we analyze the impact of text complexity of Web pages on predicting knowledge gain during a search session. For this purpose, we conduct an experimental case study and investigate the in˝uence of several text-based features and classiÿers on the prediction task. We build upon data from a study of related work, where 104 participants were given the task to learn about the formation of lightning and thunder through Web search. We perform an extensive evaluation based on a state-of-the-art approach and extend it with additional features related to textual complexity of Web pages. In contrast to prior work, we perform a systematic search for optimal hyperparameters and show the possible in˝uence of feature selection strategies on the knowledge gain prediction. When using the new set of features, state-of-the-art results are noticeably improved. The results indicate that text complexity of Web pages could be an important feature resource for knowledge gain prediction.