Search Results

Now showing 1 - 10 of 75
  • Item
    Compact representations for efficient storage of semantic sensor data
    (Dordrecht : Springer Science + Business Media B.V, 2021) Karim, Farah; Vidal, Maria-Esther; Auer, Sören
    Nowadays, there is a rapid increase in the number of sensor data generated by a wide variety of sensors and devices. Data semantics facilitate information exchange, adaptability, and interoperability among several sensors and devices. Sensor data and their meaning can be described using ontologies, e.g., the Semantic Sensor Network (SSN) Ontology. Notwithstanding, semantically enriched, the size of semantic sensor data is substantially larger than raw sensor data. Moreover, some measurement values can be observed by sensors several times, and a huge number of repeated facts about sensor data can be produced. We propose a compact or factorized representation of semantic sensor data, where repeated measurement values are described only once. Furthermore, these compact representations are able to enhance the storage and processing of semantic sensor data. To scale up to large datasets, factorization based, tabular representations are exploited to store and manage factorized semantic sensor data using Big Data technologies. We empirically study the effectiveness of a semantic sensor’s proposed compact representations and their impact on query processing. Additionally, we evaluate the effects of storing the proposed representations on diverse RDF implementations. Results suggest that the proposed compact representations empower the storage and query processing of sensor data over diverse RDF implementations, and up to two orders of magnitude can reduce query execution time.
  • Item
    Digital Humanities Handbuch
    (2015-08-12) Hahn, Helene; Kalman, Tibor; Pielström, Steffen; Puhl, Johanna; Kolbmann, Wibke; Kollatz, Thomas; Neuschäfer, Markus; Stiller, Juliane; Tonne, Danah
    Um das Handbuch möglichst praxisnah zu gestalten, haben wir uns entschieden, zuerst einzelne DH-Projekte vorzustellen, um die Möglichkeiten der DH den Lebringen und ihnen zu zeigen, was in der Praxis in dem Bereich derzeit schon umgesetzt wurde. So zeigen wir in Kapitel 2, wie mit TextGrid Texte editiert und meCodicology Handschriften analysiert werden. Die folgenden drei Kapitel beschäftigen sich mit den drei Säulen, die jedes Projekt in den Digital Humanities trag Methoden und Werkzeuge, und Infrastruktur. Die Kapitel bieten erste Einführungen in die jeweilige Thematik und vermitteln den Lesern an die Praxis angelehntsie in eigenen DH-Projekten anwenden können. Die Kapitel Daten und Alles was Recht ist - Urheberrecht und Lizenzierung von Forschungsdaten weisen in die Grundlage wissenschaftlichen Forschens ein und bieten Hilfestellungen im Umgang mit Lizenzen und Dateiformaten. Das Kapitel Methoden und Werkzeuge ze Digital Humanities auf und verweist beispielhaft auf digitale Werkzeuge, die für die Beantwortung geisteswissenschaftlicher Forschungsfragen herangezogen weKapitel Infrastruktur werden Digitale Infrastrukturen, deren Komponenten und Zielstellungen näher beschrieben. Sie sind unerlässlich, um die digitale Forschunund nachhaltig zu gestalten.
  • Item
    Deutschsprachige Game Studies 2021 – 2031: eine Vorausschau
    (München : Ludwig-Maximilians-Universität München, Institut für Deutsche Philologie, 2021) Inderst, Rudolf; Heller, Lambert
    Rudolf Inderst und Lambert Heller stellen die grundsätzliche Frage, ob Text überhaupt die richtige Form ist, um sich mit digitalen Spielen wissenschaftlich auseinanderzusetzen. Sie sprechen sich dabei für die Etablierung und Verwendung der Form des Videoessays ein, die bereits in ihrer audiovisuellen Materialität dem Gegenstand angemessener sei.
  • Item
    Easy Semantification of Bioassays
    (Heidelberg : Springer, 2022) Anteghini, Marco; D’Souza, Jennifer; dos Santos, Vitor A. P. Martins; Auer, Sören
    Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.
  • Item
    Understanding image-text relations and news values for multimodal news analysis
    (Lausanne : Frontiers Media, 2023) Cheema, Gullal S.; Hakimov, Sherzod; Müller-Budack, Eric; Otto, Christian; Bateman, John A.; Ewerth, Ralph
    The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.
  • Item
    Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV
    (Amsterdam : IOS Press, 2020) Chaves-Fraga, David; Ruckhaus, Edna; Priyatna, Freddy; Vidal, Maria-Esther; Corchio, Oscar
    Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets, either by materializing integrated data into RDF or by performing on-the fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented; thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with a benchmark using the GTFS dataset from the Madrid subway; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of MorphCSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.
  • Item
    The quest for research information
    (Amsterdam : Elsevier, 2014) Blümel, Ina; Dietze, Stefan; Heller, Lambert; Jäschke, Robert; Mehlberg, Martin
    Research information, i.e., data about research projects, organisations, researchers or research outputs such as publications or patents, is spread across the web, usually residing in institutional and personal web pages or in semi-open databases and information systems. While there exists a wealth of unstructured information, structured data is limited and often exposed following proprietary or less-established schemas and interfaces. Therefore, a holistic and consistent view on research information across organisational and national boundaries is not feasible. On the other hand, web crawling and information extraction techniques have matured throughout the last decade, allowing for automated approaches of harvesting, extracting and consolidating research information into a more coherent knowledge graph. In this work, we give an overview of the current state of the art in research information sharing on the web and present initial ideas towards a more holistic approach for boot-strapping research information from available web sources.
  • Item
    Why reinvent the wheel: Let's build question answering systems together
    (New York City : Association for Computing Machinery, 2018) Singh, K.; Radhakrishna, A.S.; Both, A.; Shekarpour, S.; Lytra, I.; Usbeck, R.; Vyas, A.; Khikmatullaev, A.; Punjani, D.; Lange, C.; Vidal, Maria-Esther; Lehmann, J.; Auer, Sören
    Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.
  • Item
    Ontology-Based Representation for Accessible OpenCourseWare Systems
    (Basel : MDPI Publ., 2018-11-29) Elias, Mirette; Lohmann, Steffen; Auer, Sören
    OpenCourseWare (OCW) systems have been established to provide open educational resources that are accessible by anyone, including learners with special accessibility needs and preferences. We need to find a formal and interoperable way to describe these preferences in order to use them in OCW systems and retrieve relevant educational resources. This formal representation should use standard accessibility definitions of OCW that can be reused by other OCW systems to represent accessibility concepts. In this article, we present an ontology to represent the accessibility needs of learners with respect to the IMS AfA specifications. The ontology definitions together with rule-based queries are used to retrieve relevant educational resources. Related to this, we developed a user interface component that enables users to create accessibility profiles representing their individual needs and preferences based on our ontology. We evaluated the approach with five examples profiles.
  • Item
    Ranking facts for explaining answers to elementary science questions
    (Cambridge : Cambridge University Press, 2023) D’Souza, Jennifer; Mulang, Isaiah Onando; Auer, Sören
    In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question's answer by “connecting the dots” across various pertinent facts. Considering automated reasoning for elementary science question answering, we address the novel task of generating explanations for answers from human-authored facts. For this, we examine the practically scalable framework of feature-rich support vector machines leveraging domain-targeted, hand-crafted features. Explanations are created from a human-annotated set of nearly 5000 candidate facts in the WorldTree corpus. Our aim is to obtain better matches for valid facts of an explanation for the correct answer of a question over the available fact candidates. To this end, our features offer a comprehensive linguistic and semantic unification paradigm. The machine learning problem is the preference ordering of facts, for which we test pointwise regression versus pairwise learning-to-rank. Our contributions, originating from comprehensive evaluations against nine existing systems, are (1) a case study in which two preference ordering approaches are systematically compared, and where the pointwise approach is shown to outperform the pairwise approach, thus adding to the existing survey of observations on this topic; (2) since our system outperforms a highly-effective TF-IDF-based IR technique by 3.5 and 4.9 points on the development and test sets, respectively, it demonstrates some of the further task improvement possibilities (e.g., in terms of an efficient learning algorithm, semantic features) on this task; (3) it is a practically competent approach that can outperform some variants of BERT-based reranking models; and (4) the human-engineered features make it an interpretable machine learning model for the task.