Search Results

Now showing 1 - 10 of 91
Loading...
Thumbnail Image
Item

DDB-KG: The German Bibliographic Heritage in a Knowledge Graph

2021, Tan, Mary Ann, Tietz, Tabea, Bruns, Oleksandra, Oppenlaender, Jonas, Dessì, Danilo, Harald, Sack, Sumikawa, Yasunobu, Ikejiri, Ryohei, Doucet, Antoine, Pfanzelter, Eva, Hasanuzzaman, Mohammed, Dias, Gaël, Milligan, Ian, Jatowt, Adam

Under the German government’s initiative “NEUSTART Kultur”, the German Digital Library or Deutsche Digitale Bibliothek (DDB) is undergoing improvements to enhance user-experience. As an initial step, emphasis is placed on creating a knowledge graph from the bibliographic record collection of the DDB. This paper discusses the challenges facing the DDB in terms of retrieval and the solutions in addressing them. In particular, limitations of the current data model or ontology to represent bibliographic metadata is analyzed through concrete examples. This study presents the complete ontological mapping from DDB-Europeana Data Model (DDB-EDM) to FaBiO, and a prototype of the DDB-KG made available as a SPARQL endpoint. The suitabiliy of the target ontology is demonstrated with SPARQL queries formulated from competency questions.

Loading...
Thumbnail Image
Item

Survey on Big Data Applications

2020, Janev, Valentina, Pujić, Dea, Jelić, Marko, Vidal, Maria-Esther, Janev, Valentina, Graux, Damien, Jabeen, Hajira, Sallinger, Emanuel

The goal of this chapter is to shed light on different types of big data applications needed in various industries including healthcare, transportation, energy, banking and insurance, digital media and e-commerce, environment, safety and security, telecommunications, and manufacturing. In response to the problems of analyzing large-scale data, different tools, techniques, and technologies have bee developed and are available for experimentation. In our analysis, we focused on literature (review articles) accessible via the Elsevier ScienceDirect service and the Springer Link service from more recent years, mainly from the last two decades. For the selected industries, this chapter also discusses challenges that can be addressed and overcome using the semantic processing approaches and knowledge reasoning approaches discussed in this book.

Loading...
Thumbnail Image
Item

Improving Zero-Shot Text Classification with Graph-based Knowledge Representations

2022, Hoppe, Fabian, Hartig, Olaf, Seneviratne, Oshani

Insufficient training data is a key challenge for text classification. In particular, long-tail class distributions and emerging, new classes do not provide any training data for specific classes. Therefore, such a zeroshot setting must incorporate additional, external knowledge to enable transfer learning by connecting the external knowledge of previously unseen classes to texts. Recent zero-shot text classifier utilize only distributional semantics defined by large language models and based on class names or natural language descriptions. This implicit knowledge contains ambiguities, is not able to capture logical relations nor is it an efficient representation of factual knowledge. These drawbacks can be avoided by introducing explicit, external knowledge. Especially, knowledge graphs provide such explicit, unambiguous, and complementary, domain specific knowledge. Hence, this thesis explores graph-based knowledge as additional modality for zero-shot text classification. Besides a general investigation of this modality, the influence on the capabilities of dealing with domain shifts by including domain-specific knowledge is explored.

Loading...
Thumbnail Image
Item

Ontology Modelling for Materials Science Experiments

2021, Alam, Mehwish, Birkholz, Henk, Dessì, Danilo, Eberl, Christoph, Fliegl, Heike, Gumbsch, Peter, von Hartrott, Philipp, Mädler, Lutz, Niebel, Markus, Sack, Harald, Thomas, Akhil, Tiddi, Ilaria, Maleshkova, Maria, Pellegrini, Tassilo, de Boer, Victor

Materials are either enabler or bottleneck for the vast majority of technological innovations. The digitization of materials and processes is mandatory to create live production environments which represent physical entities and their aggregations and thus allow to represent, share, and understand materials changes. However, a common standard formalization for materials knowledge in the form of taxonomies, ontologies, or knowledge graphs has not been achieved yet. This paper sketches the e_orts in modelling an ontology prototype to describe Materials Science experiments. It describes what is expected from the ontology by introducing a use case where a process chain driven by the ontology enables the curation and understanding of experiments.

Loading...
Thumbnail Image
Item

Unveiling Relations in the Industry 4.0 Standards Landscape Based on Knowledge Graph Embeddings

2020, Rivas, Ariam, Grangel-González, Irlán, Collarana, Diego, Lehmann, Jens, Vidal, Maria-Esther, Hartmann, Sven, Küng, Josef, Kotsis, Gabriele, Tjoa, A Min, Khalil, Ismail

Industry 4.0 (I4.0) standards and standardization frameworks have been proposed with the goal of empowering interoperability in smart factories. These standards enable the description and interaction of the main components, systems, and processes inside of a smart factory. Due to the growing number of frameworks and standards, there is an increasing need for approaches that automatically analyze the landscape of I4.0 standards. Standardization frameworks classify standards according to their functions into layers and dimensions. However, similar standards can be classified differently across the frameworks, producing, thus, interoperability conflicts among them. Semantic-based approaches that rely on ontologies and knowledge graphs, have been proposed to represent standards, known relations among them, as well as their classification according to existing frameworks. Albeit informative, the structured modeling of the I4.0 landscape only provides the foundations for detecting interoperability issues. Thus, graph-based analytical methods able to exploit knowledge encoded by these approaches, are required to uncover alignments among standards. We study the relatedness among standards and frameworks based on community analysis to discover knowledge that helps to cope with interoperability conflicts between standards. We use knowledge graph embeddings to automatically create these communities exploiting the meaning of the existing relationships. In particular, we focus on the identification of similar standards, i.e., communities of standards, and analyze their properties to detect unknown relations. We empirically evaluate our approach on a knowledge graph of I4.0 standards using the Trans∗ family of embedding models for knowledge graph entities. Our results are promising and suggest that relations among standards can be detected accurately.

Loading...
Thumbnail Image
Item

Understanding Class Representations: An Intrinsic Evaluation of Zero-Shot Text Classification

2021, Hoppe, Fabian, Dessì, Danilo, Sack, Harald, Alam, Mehwish, Buscaldi, Davide, Cochez, Michael, Osborne, Francesco, Reforgiato Recupero, Diego, Sack, Harald

Frequently, Text Classification is limited by insufficient training data. This problem is addressed by Zero-Shot Classification through the inclusion of external class definitions and then exploiting the relations between classes seen during training and unseen classes (Zero-shot). However, it requires a class embedding space capable of accurately representing the semantic relatedness between classes. This work defines an intrinsic evaluation based on greater-than constraints to provide a better understanding of this relatedness. The results imply that textual embeddings are able to capture more semantics than Knowledge Graph embeddings, but combining both modalities yields the best performance.

Loading...
Thumbnail Image
Item

Interaction Network Analysis Using Semantic Similarity Based on Translation Embeddings

2019, Manzoor Bajwa, Awais, Collarana, Diego, Vidal, Maria-Esther, Acosta, Maribel, Cudré-Mauroux, Philippe, Maleshkova, Maria, Pellegrini, Tassilo, Sack, Harald, Sure-Vetter, York

Biomedical knowledge graphs such as STITCH, SIDER, and Drugbank provide the basis for the discovery of associations between biomedical entities, e.g., interactions between drugs and targets. Link prediction is a paramount task and represents a building block for supporting knowledge discovery. Although several approaches have been proposed for effectively predicting links, the role of semantics has not been studied in depth. In this work, we tackle the problem of discovering interactions between drugs and targets, and propose SimTransE, a machine learning-based approach that solves this problem effectively. SimTransE relies on translating embeddings to model drug-target interactions and values of similarity across them. Grounded on the vectorial representation of drug-target interactions, SimTransE is able to discover novel drug-target interactions. We empirically study SimTransE using state-of-the-art benchmarks and approaches. Experimental results suggest that SimTransE is competitive with the state of the art, representing, thus, an effective alternative for knowledge discovery in the biomedical domain.

Loading...
Thumbnail Image
Item

IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems

2022-10-26, Wolsing, Konrad, Wagner, Eric, Saillard, Antoine, Henze, Martin

The increasing interconnection of industrial networks exposes them to an ever-growing risk of cyber attacks. To reveal such attacks early and prevent any damage, industrial intrusion detection searches for anomalies in otherwise predictable communication or process behavior. However, current efforts mostly focus on specific domains and protocols, leading to a research landscape broken up into isolated silos. Thus, existing approaches cannot be applied to other industries that would equally benefit from powerful detection. To better understand this issue, we survey 53 detection systems and find no fundamental reason for their narrow focus. Although they are often coupled to specific industrial protocols in practice, many approaches could generalize to new industrial scenarios in theory. To unlock this potential, we propose IPAL, our industrial protocol abstraction layer, to decouple intrusion detection from domain-specific industrial protocols. After proving IPAL's correctness in a reproducibility study of related work, we showcase its unique benefits by studying the generalizability of existing approaches to new datasets and conclude that they are indeed not restricted to specific domains or protocols and can perform outside their restricted silos.

Loading...
Thumbnail Image
Item

A Data-Driven Approach for Analyzing Healthcare Services Extracted from Clinical Records

2020, Scurti, Manuel, Menasalvas-Ruiz, Ernestina, Vidal, Maria-Esther, Torrente, Maria, Vogiatzis, Dimitrios, Paliouras, George, Provencio, Mariano, Rodríguez-González, Alejandro, Seco de Herrera, Alba García, Rodríguez González, Alejandro, Santosh, K.C., Temesgen, Zelalem, Soda, Paolo

Cancer remains one of the major public health challenges worldwide. After cardiovascular diseases, cancer is one of the first causes of death and morbidity in Europe, with more than 4 million new cases and 1.9 million deaths per year. The suboptimal management of cancer patients during treatment and subsequent follows up are major obstacles in achieving better outcomes of the patients and especially regarding cost and quality of life In this paper, we present an initial data-driven approach to analyze the resources and services that are used more frequently by lung-cancer patients with the aim of identifying where the care process can be improved by paying a special attention on services before diagnosis to being able to identify possible lung-cancer patients before they are diagnosed and by reducing the length of stay in the hospital. Our approach has been built by analyzing the clinical notes of those oncological patients to extract this information and their relationships with other variables of the patient. Although the approach shown in this manuscript is very preliminary, it shows that quite interesting outcomes can be derived from further analysis. © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Loading...
Thumbnail Image
Item

Temporal Role Annotation for Named Entities

2018, Koutraki, Maria, Bakhshandegan-Moghaddam, Farshad, Sack, Harald, Fensel, Anna, de Boer, Victor, Pellegrini, Tassilo, Kiesling, Elmar, Haslhofer, Bernhard, Hollink, Laura, Schindler, Alexander

Natural language understanding tasks are key to extracting structured and semantic information from text. One of the most challenging problems in natural language is ambiguity and resolving such ambiguity based on context including temporal information. This paper, focuses on the task of extracting temporal roles from text, e.g. CEO of an organization or head of a state. A temporal role has a domain, which may resolve to different entities depending on the context and especially on temporal information, e.g. CEO of Microsoft in 2000. We focus on the temporal role extraction, as a precursor for temporal role disambiguation. We propose a structured prediction approach based on Conditional Random Fields (CRF) to annotate temporal roles in text and rely on a rich feature set, which extracts syntactic and semantic information from text. We perform an extensive evaluation of our approach based on two datasets. In the first dataset, we extract nearly 400k instances from Wikipedia through distant supervision, whereas in the second dataset, a manually curated ground-truth consisting of 200 instances is extracted from a sample of The New York Times (NYT) articles. Last, the proposed approach is compared against baselines where significant improvements are shown for both datasets.