Search Results

Now showing 1 - 10 of 36
Loading...
Thumbnail Image
Item

Collaborative annotation and semantic enrichment of 3D media

2022, Rossenova, Lozana, Schubert, Zoe, Vock, Richard, Sohmen, Lucia, Günther, Lukas, Duchesne, Paul, Blümel, Ina, Aizawa, Akiko

A new FOSS (free and open source software) toolchain and associated workflow is being developed in the context of NFDI4Culture, a German consortium of research- and cultural heritage institutions working towards a shared infrastructure for research data that meets the needs of 21st century data creators, maintainers and end users across the broad spectrum of the digital libraries and archives field, and the digital humanities. This short paper and demo present how the integrated toolchain connects: 1) OpenRefine - for data reconciliation and batch upload; 2) Wikibase - for linked open data (LOD) storage; and 3) Kompakkt - for rendering and annotating 3D models. The presentation is aimed at librarians, digital curators and data managers interested in learning how to manage research datasets containing 3D media, and how to make them available within an open data environment with 3D-rendering and collaborative annotation features.

Loading...
Thumbnail Image
Item

Combining Textual Features for the Detection of Hateful and Offensive Language

2021, Hakimov, Sherzod, Ewerth, Ralph, Mehta, Parth, Mandl, Thomas, Majumder, Prasenjit, Mitra, Mandar

The detection of offensive, hateful and profane language has become a critical challenge since many users in social networks are exposed to cyberbullying activities on a daily basis. In this paper, we present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture. The proposed architecture is evaluated on the English Subtask 1A: Identifying Hate, offensive and profane content from the post datasets of HASOC-2021 dataset under the team name TIB-VA. We compared different variants of the contextual word embeddings combined with the character level embeddings and the encoding of collected hate terms.

Loading...
Thumbnail Image
Item

On the Impact of Features and Classifiers for Measuring Knowledge Gain during Web Search - A Case Study

2021, Gritz, Wolfgang, Hoppe, Anett, Ewerth, Ralph, Cong, Gao, Ramanath, Maya

Search engines are normally not designed to support human learning intents and processes. The ÿeld of Search as Learning (SAL) aims to investigate the characteristics of a successful Web search with a learning purpose. In this paper, we analyze the impact of text complexity of Web pages on predicting knowledge gain during a search session. For this purpose, we conduct an experimental case study and investigate the in˝uence of several text-based features and classiÿers on the prediction task. We build upon data from a study of related work, where 104 participants were given the task to learn about the formation of lightning and thunder through Web search. We perform an extensive evaluation based on a state-of-the-art approach and extend it with additional features related to textual complexity of Web pages. In contrast to prior work, we perform a systematic search for optimal hyperparameters and show the possible in˝uence of feature selection strategies on the knowledge gain prediction. When using the new set of features, state-of-the-art results are noticeably improved. The results indicate that text complexity of Web pages could be an important feature resource for knowledge gain prediction.

Loading...
Thumbnail Image
Item

The Concept of Identifiability in ML Models

2022, von Maltzan, Stephanie, Bastieri, Denis, Wills, Gary, Kacsuk, Péter, Chang, Victor

Recent research indicates that the machine learning process can be reversed by adversarial attacks. These attacks can be used to derive personal information from the training. The supposedly anonymising machine learning process represents a process of pseudonymisation and is, therefore, subject to technical and organisational measures. Consequently, the unexamined belief in anonymisation as a guarantor for privacy cannot be easily upheld. It is, therefore, crucial to measure privacy through the lens of adversarial attacks and precisely distinguish what is meant by personal data and non-personal data and above all determine whether ML models represent pseudonyms from the training data.

Loading...
Thumbnail Image
Item

Contextual Language Models for Knowledge Graph Completion

2021, Russa, Biswas, Sofronova, Radina, Alam, Mehwish, Sack, Harald, Mehwish, Alam, Ali, Medi, Groth, Paul, Hitzler, Pascal, Lehmann, Jens, Paulheim, Heiko, Rettinger, Achim, Sack, Harald, Sadeghi, Afshin, Tresp, Volker

Knowledge Graphs (KGs) have become the backbone of various machine learning based applications over the past decade. However, the KGs are often incomplete and inconsistent. Several representation learning based approaches have been introduced to complete the missing information in KGs. Besides, Neural Language Models (NLMs) have gained huge momentum in NLP applications. However, exploiting the contextual NLMs to tackle the Knowledge Graph Completion (KGC) task is still an open research problem. In this paper, a GPT-2 based KGC model is proposed and is evaluated on two benchmark datasets. The initial results obtained from the _ne-tuning of the GPT-2 model for triple classi_cation strengthens the importance of usage of NLMs for KGC. Also, the impact of contextual language models for KGC has been discussed.

Loading...
Thumbnail Image
Item

Modelling Archival Hierarchies in Practice: Key Aspects and Lessons Learned

2021, Vafaie, Mahsa, Bruns, Oleksandra, Pilz, Nastasja, Dessì, Danilo, Sack, Harald, Sumikawa, Yasunobu, Ikejiri, Ryohei, Doucet, Antoine, Pfanzelter, Eva, Hasanuzzaman, Mohammed, Dias, Gaël, Milligan, Ian, Jatowt, Adam

An increasing number of archival institutions aim to provide public access to historical documents. Ontologies have been designed, developed and utilised to model the archival description of historical documents and to enable interoperability between different information sources. However, due to the heterogeneous nature of archives and archival systems, current ontologies for the representation of archival content do not always cover all existing structural organisation forms equallywell. After briefly contextualising the heterogeneity in the hierarchical structure of German archives, this paper describes and evaluates differences between two archival ontologies, ArDO and RiC-O, and their approaches to modelling hierarchy levels and archive dynamics.

Loading...
Thumbnail Image
Item

DDB-KG: The German Bibliographic Heritage in a Knowledge Graph

2021, Tan, Mary Ann, Tietz, Tabea, Bruns, Oleksandra, Oppenlaender, Jonas, Dessì, Danilo, Harald, Sack, Sumikawa, Yasunobu, Ikejiri, Ryohei, Doucet, Antoine, Pfanzelter, Eva, Hasanuzzaman, Mohammed, Dias, Gaël, Milligan, Ian, Jatowt, Adam

Under the German government’s initiative “NEUSTART Kultur”, the German Digital Library or Deutsche Digitale Bibliothek (DDB) is undergoing improvements to enhance user-experience. As an initial step, emphasis is placed on creating a knowledge graph from the bibliographic record collection of the DDB. This paper discusses the challenges facing the DDB in terms of retrieval and the solutions in addressing them. In particular, limitations of the current data model or ontology to represent bibliographic metadata is analyzed through concrete examples. This study presents the complete ontological mapping from DDB-Europeana Data Model (DDB-EDM) to FaBiO, and a prototype of the DDB-KG made available as a SPARQL endpoint. The suitabiliy of the target ontology is demonstrated with SPARQL queries formulated from competency questions.

Loading...
Thumbnail Image
Item

Check square at CheckThat! 2020: Claim Detection in Social Media via Fusion of Transformer and Syntactic Features

2020, Cheema, Gullasl S., Hakimov, Sherzod, Ewerth, Ralph, Cappellato, Linda, Eickhoff, Carsten, Ferro, Nicola, Névéol, Aurélie

In this digital age of news consumption, a news reader has the ability to react, express and share opinions with others in a highly interactive and fast manner. As a consequence, fake news has made its way into our daily life because of very limited capacity to verify news on the Internet by large companies as well as individuals. In this paper, we focus on solving two problems which are part of the fact-checking ecosystem that can help to automate fact-checking of claims in an ever increasing stream of content on social media. For the first prob-lem, claim check-worthiness prediction, we explore the fusion of syntac-tic features and deep transformer Bidirectional Encoder Representations from Transformers (BERT) embeddings, to classify check-worthiness of a tweet, i.e. whether it includes a claim or not. We conduct a detailed feature analysis and present our best performing models for English and Arabic tweets. For the second problem, claim retrieval, we explore the pre-trained embeddings from a Siamese network transformer model (sentence-transformers) specifically trained for semantic textual similar-ity, and perform KD-search to retrieve verified claims with respect to a query tweet.

Loading...
Thumbnail Image
Item

Leveraging Literals for Knowledge Graph Embeddings

2021, Gesese, Genet Asefa, Tamma, Valentina, Fernandez, Miriam, Poveda-Villalón, María

Nowadays, Knowledge Graphs (KGs) have become invaluable for various applications such as named entity recognition, entity linking, question answering. However, there is a huge computational and storage cost associated with these KG-based applications. Therefore, there arises the necessity of transforming the high dimensional KGs into low dimensional vector spaces, i.e., learning representations for the KGs. Since a KG represents facts in the form of interrelations between entities and also using attributes of entities, the semantics present in both forms should be preserved while transforming the KG into a vector space. Hence, the main focus of this thesis is to deal with the multimodality and multilinguality of literals when utilizing them for the representation learning of KGs. The other task is to extract benchmark datasets with a high level of difficulty for tasks such as link prediction and triple classification. These datasets could be used for evaluating both kind of KG Embeddings, those using literals and those which do not include literals.

Loading...
Thumbnail Image
Item

PowerDuck: A GOOSE Data Set of Cyberattacks in Substations

2022-08-08, Zemanek, Sven, Hacker, Immanuel, Wolsing, Konrad, Wagner, Eric, Henze, Martin, Serror, Martin

Power grids worldwide are increasingly victims of cyberattacks, where attackers can cause immense damage to critical infrastructure. The growing digitalization and networking in power grids combined with insufficient protection against cyberattacks further exacerbate this trend. Hence, security engineers and researchers must counter these new risks by continuously improving security measures. Data sets of real network traffic during cyberattacks play a decisive role in analyzing and understanding such attacks. Therefore, this paper presents PowerDuck, a publicly available security data set containing network traces of GOOSE communication in a physical substation testbed. The data set includes recordings of various scenarios with and without the presence of attacks. Furthermore, all network packets originating from the attacker are clearly labeled to facilitate their identification. We thus envision PowerDuck improving and complementing existing data sets of substations, which are often generated synthetically, thus enhancing the security of power grids.