Search Results

Now showing 1 - 10 of 22
  • Item
    Digital Humanities Handbuch
    (2015-08-12) Hahn, Helene; Kalman, Tibor; Pielström, Steffen; Puhl, Johanna; Kolbmann, Wibke; Kollatz, Thomas; Neuschäfer, Markus; Stiller, Juliane; Tonne, Danah
    Um das Handbuch möglichst praxisnah zu gestalten, haben wir uns entschieden, zuerst einzelne DH-Projekte vorzustellen, um die Möglichkeiten der DH den Lebringen und ihnen zu zeigen, was in der Praxis in dem Bereich derzeit schon umgesetzt wurde. So zeigen wir in Kapitel 2, wie mit TextGrid Texte editiert und meCodicology Handschriften analysiert werden. Die folgenden drei Kapitel beschäftigen sich mit den drei Säulen, die jedes Projekt in den Digital Humanities trag Methoden und Werkzeuge, und Infrastruktur. Die Kapitel bieten erste Einführungen in die jeweilige Thematik und vermitteln den Lesern an die Praxis angelehntsie in eigenen DH-Projekten anwenden können. Die Kapitel Daten und Alles was Recht ist - Urheberrecht und Lizenzierung von Forschungsdaten weisen in die Grundlage wissenschaftlichen Forschens ein und bieten Hilfestellungen im Umgang mit Lizenzen und Dateiformaten. Das Kapitel Methoden und Werkzeuge ze Digital Humanities auf und verweist beispielhaft auf digitale Werkzeuge, die für die Beantwortung geisteswissenschaftlicher Forschungsfragen herangezogen weKapitel Infrastruktur werden Digitale Infrastrukturen, deren Komponenten und Zielstellungen näher beschrieben. Sie sind unerlässlich, um die digitale Forschunund nachhaltig zu gestalten.
  • Item
    Analyzing social media for measuring public attitudes toward controversies and their driving factors: a case study of migration
    (Wien : Springer, 2022) Chen, Yiyi; Sack, Harald; Alam, Mehwish
    Among other ways of expressing opinions on media such as blogs, and forums, social media (such as Twitter) has become one of the most widely used channels by populations for expressing their opinions. With an increasing interest in the topic of migration in Europe, it is important to process and analyze these opinions. To this end, this study aims at measuring the public attitudes toward migration in terms of sentiments and hate speech from a large number of tweets crawled on the decisive topic of migration. This study introduces a knowledge base (KB) of anonymized migration-related annotated tweets termed as MigrationsKB (MGKB). The tweets from 2013 to July 2021 in the European countries that are hosts of immigrants are collected, pre-processed, and filtered using advanced topic modeling techniques. BERT-based entity linking and sentiment analysis, complemented by attention-based hate speech detection, are performed to annotate the curated tweets. Moreover, external databases are used to identify the potential social and economic factors causing negative public attitudes toward migration. The analysis aligns with the hypothesis that the countries with more migrants have fewer negative and hateful tweets. To further promote research in the interdisciplinary fields of social sciences and computer science, the outcomes are integrated into MGKB, which significantly extends the existing ontology to consider the public attitudes toward migrations and economic indicators. This study further discusses the use-cases and exploitation of MGKB. Finally, MGKB is made publicly available, fully supporting the FAIR principles.
  • Item
    Multimodal news analytics using measures of cross-modal entity and context consistency
    (London : Springer, 2021) Müller-Budack, Eric; Theiner, Jonas; Diering, Sebastian; Idahl, Maximilian; Hakimov, Sherzod; Ewerth, Ralph
    The World Wide Web has become a popular source to gather information and news. Multimodal information, e.g., supplement text with photographs, is typically used to convey the news more effectively or to attract attention. The photographs can be decorative, depict additional details, but might also contain misleading information. The quantification of the cross-modal consistency of entity representations can assist human assessors’ evaluation of the overall multimodal message. In some cases such measures might give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we present a multimodal approach to quantify the entity coherence between image and text in real-world news. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate the cross-modal similarity of the entities in text and photograph by exploiting state-of-the-art computer vision approaches. In contrast to previous work, our system automatically acquires example data from the Web and is applicable to real-world news. Moreover, an approach that quantifies contextual image-text relations is introduced. The feasibility is demonstrated on two datasets that cover different languages, topics, and domains.
  • Item
    Knowledge Graphs - Working Group Charter (NFDI section-metadata) (1.2)
    (Genève : CERN, 2023) Stocker, Markus; Rossenova, Lozana; Shigapov, Renat; Betancort, Noemi; Dietze, Stefan; Murphy, Bridget; Bölling, Christian; Schubotz, Moritz; Koepler, Oliver
    Knowledge Graphs are a key technology for implementing the FAIR principles in data infrastructures by ensuring interoperability for both humans and machines. The Working Group "Knowledge Graphs" in Section "(Meta)data, Terminologies, Provenance" of the German National Research Data Infrastructure (Nationale Forschungsdateninfrastruktur (NFDI) e.V.) aims to promote the use of knowledge graphs in all NFDI consortia, to facilitate cross-domain data interlinking and federation following the FAIR principles, and to contribute to the joint development of tools and technologies that enable transformation of structured and unstructured data into semantically reusable knowledge across different domains.
  • Item
    Modelling Archival Hierarchies in Practice: Key Aspects and Lessons Learned
    (Aachen, Germany : RWTH Aachen, 2021) Vafaie, Mahsa; Bruns, Oleksandra; Pilz, Nastasja; Dessì, Danilo; Sack, Harald; Sumikawa, Yasunobu; Ikejiri, Ryohei; Doucet, Antoine; Pfanzelter, Eva; Hasanuzzaman, Mohammed; Dias, Gaël; Milligan, Ian; Jatowt, Adam
    An increasing number of archival institutions aim to provide public access to historical documents. Ontologies have been designed, developed and utilised to model the archival description of historical documents and to enable interoperability between different information sources. However, due to the heterogeneous nature of archives and archival systems, current ontologies for the representation of archival content do not always cover all existing structural organisation forms equallywell. After briefly contextualising the heterogeneity in the hierarchical structure of German archives, this paper describes and evaluates differences between two archival ontologies, ArDO and RiC-O, and their approaches to modelling hierarchy levels and archive dynamics.
  • Item
    DDB-KG: The German Bibliographic Heritage in a Knowledge Graph
    (Aachen, Germany : RWTH Aachen, 2021) Tan, Mary Ann; Tietz, Tabea; Bruns, Oleksandra; Oppenlaender, Jonas; Dessì, Danilo; Harald, Sack; Sumikawa, Yasunobu; Ikejiri, Ryohei; Doucet, Antoine; Pfanzelter, Eva; Hasanuzzaman, Mohammed; Dias, Gaël; Milligan, Ian; Jatowt, Adam
    Under the German government’s initiative “NEUSTART Kultur”, the German Digital Library or Deutsche Digitale Bibliothek (DDB) is undergoing improvements to enhance user-experience. As an initial step, emphasis is placed on creating a knowledge graph from the bibliographic record collection of the DDB. This paper discusses the challenges facing the DDB in terms of retrieval and the solutions in addressing them. In particular, limitations of the current data model or ontology to represent bibliographic metadata is analyzed through concrete examples. This study presents the complete ontological mapping from DDB-Europeana Data Model (DDB-EDM) to FaBiO, and a prototype of the DDB-KG made available as a SPARQL endpoint. The suitabiliy of the target ontology is demonstrated with SPARQL queries formulated from competency questions.
  • Item
    TinyGenius: Intertwining natural language processing with microtask crowdsourcing for scholarly knowledge graph creation
    (New York,NY,United States : Association for Computing Machinery, 2022) Oelen, Allard; Stocker, Markus; Auer, Sören; Aizawa, Akiko
    As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.
  • Item
    On the Impact of Temporal Representations on Metaphor Detection
    (Paris : European Language Resources Association (ELRA), 2022) Giorgio Ottolina; Matteo Palmonari; Manuel Vimercati; Mehwish Alam; Calzolari, Nicoletta; Béchet, Frédéric; Blache, Philippe; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Odijk, Jan; Piperidis, Stelios
    State-of-the-art approaches for metaphor detection compare their literal - or core - meaning and their contextual meaning using metaphor classifiers based on neural networks. However, metaphorical expressions evolve over time due to various reasons, such as cultural and societal impact. Metaphorical expressions are known to co-evolve with language and literal word meanings, and even drive, to some extent, this evolution. This poses the question of whether different, possibly time-specific, representations of literal meanings may impact the metaphor detection task. To the best of our knowledge, this is the first study that examines the metaphor detection task with a detailed exploratory analysis where different temporal and static word embeddings are used to account for different representations of literal meanings. Our experimental analysis is based on three popular benchmarks used for metaphor detection and word embeddings extracted from different corpora and temporally aligned using different state-of-the-art approaches. The results suggest that the usage of different static word embedding methods does impact the metaphor detection task and some temporal word embeddings slightly outperform static methods. However, the results also suggest that temporal word embeddings may provide representations of the core meaning of the metaphor even too close to their contextual meaning, thus confusing the classifier. Overall, the interaction between temporal language evolution and metaphor detection appears tiny in the benchmark datasets used in our experiments. This suggests that future work for the computational analysis of this important linguistic phenomenon should first start by creating a new dataset where this interaction is better represented.
  • Item
    Designing Intelligent Systems for Online Education: Open Challenges and Future Directions
    (Aachen, Germany : RWTH Aachen, 2021) Dessì, Danilo; Käser, Tanja; Marras, Mirko; Popescu, Elvira; Sack, Harald; Dessì, Danilo; Käser, Tanja; Marras, Mirko; Popescu, Elvira; Sack, Harald
    The design and delivering of platforms for online education is fostering increasingly intense research. Scaling up education online brings new emerging needs related with hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely, as examples. However, with the impressive progress of the data mining and machine learning fields, combined with the large amounts of learning-related data and high-performance computing, it has been possible to gain a deeper understanding of the nature of learning and teaching online. Methods at the analytical and algorithmic levels are constantly being developed and hybrid approaches are receiving an increasing attention. Recent methods are analyzing not only the online traces left by students a posteriori, but also the extent to which this data can be turned into actionable insights and models, to support the above needs in a computationally efficient, adaptive and timely way. In this paper, we present relevant open challenges lying at the intersection between the machine learning and educational communities, that need to be addressed to further develop the field of intelligent systems for online education. Several areas of research in this field are identified, such as data availability and sharing, time-wise and multi-modal data modelling, generalizability, fairness, explainability, interpretability, privacy, and ethics behind models delivered for supporting education. Practical challenges and recommendations for possible research directions are provided for each of them, paving the way for future advances in this field.
  • Item
    Detecting Cross-Language Plagiarism using Open Knowledge Graphs
    (Aachen, Germany : RWTH Aachen, 2021) Stegmüller, Johannes; Bauer-Marquart, Fabian; Meuschke, Norman; Ruas, Terry; Schubotz, Moritz; Gipp, Bela; Zhang, Chengzhi; Mayr, Philipp; Lu, Wie; Zhang, Yi
    Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations. We introduce the new multilingual retrieval model Cross-Language Ontology-Based Similarity Analysis (CL-OSA) for this task. CL-OSA represents documents as entity vectors obtained from the open knowledge graph Wikidata. Opposed to other methods, CL-OSA does not require computationally expensive machine translation, nor pre-training using comparable or parallel corpora. It reliably disambiguates homonyms and scales to allow its application toWebscale document collections. We show that CL-OSA outperforms state-of-the-art methods for retrieving candidate documents from five large, topically diverse test corpora that include distant language pairs like Japanese-English. For identifying cross-language plagiarism at the character level, CL-OSA primarily improves the detection of sense-for-sense translations. For these challenging cases, CL-OSA’s performance in terms of the well-established PlagDet score exceeds that of the best competitor by more than factor two. The code and data of our study are openly available.