Search Results

Now showing 1 - 10 of 20
Loading...
Thumbnail Image
Item

Context-Based Entity Matching for Big Data

2020, Tasnim, Mayesha, Collarana, Diego, Graux, Damien, Vidal, Maria-Esther, Janev, Valentina, Graux, Damien, Jabeen, Hajira, Sallinger, Emanuel

In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.

Loading...
Thumbnail Image
Item

Survey on Big Data Applications

2020, Janev, Valentina, Pujić, Dea, Jelić, Marko, Vidal, Maria-Esther, Janev, Valentina, Graux, Damien, Jabeen, Hajira, Sallinger, Emanuel

The goal of this chapter is to shed light on different types of big data applications needed in various industries including healthcare, transportation, energy, banking and insurance, digital media and e-commerce, environment, safety and security, telecommunications, and manufacturing. In response to the problems of analyzing large-scale data, different tools, techniques, and technologies have bee developed and are available for experimentation. In our analysis, we focused on literature (review articles) accessible via the Elsevier ScienceDirect service and the Springer Link service from more recent years, mainly from the last two decades. For the selected industries, this chapter also discusses challenges that can be addressed and overcome using the semantic processing approaches and knowledge reasoning approaches discussed in this book.

Loading...
Thumbnail Image
Item

Falcon 2.0: An Entity and Relation Linking Tool over Wikidata

2020, Sakor, Ahmad, Singh, Kuldeep, Patel, Anery, Vidal, Maria-Esther

The Natural Language Processing (NLP) community has significantly contributed to the solutions for entity and relation recognition from a natural language text, and possibly linking them to proper matches in Knowledge Graphs (KGs). Considering Wikidata as the background KG, there are still limited tools to link knowledge within the text to Wikidata. In this paper, we present Falcon 2.0, the first joint entity and relation linking tool over Wikidata. It receives a short natural language text in the English language and outputs a ranked list of entities and relations annotated with the proper candidates in Wikidata. The candidates are represented by their Internationalized Resource Identifier (IRI) in Wikidata. Falcon 2.0 resorts to the English language model for the recognition task (e.g., N-Gram tiling and N-Gram splitting), and then an optimization approach for the linking task. We have empirically studied the performance of Falcon 2.0 on Wikidata and concluded that it outperforms all the existing baselines. Falcon 2.0 is open source and can be reused by the community; all the required instructions of Falcon 2.0 are well-documented at our GitHub repository (https://github.com/SDM-TIB/falcon2.0). We also demonstrate an online API, which can be run without any technical expertise. Falcon 2.0 and its background knowledge bases are available as resources at https://labs.tib.eu/falcon/falcon2/.

Loading...
Thumbnail Image
Item

Experience: Open fiscal datasets, common issues, and recommendations

2018, Musyaffa, Fathoni A., Engels, Christiane, Vidal, Maria-Esther, Orlandi, Fabrizio, Auer, Sören

A pre-print paper detailing recommendation for publishing fiscal data, including assessment framework for fiscal datasets. This paper has been accepted at ACM Journal of Data and Information Quality (JDIQ) in 2018.

Loading...
Thumbnail Image
Item

Optimizing Federated Queries Based on the Physical Design of a Data Lake

2020, Rohde, Philipp D., Vidal, Maria-Esther

The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.

Loading...
Thumbnail Image
Item

Digital Transformation of Education Credential Processes and Life Cycles – A Structured Overview on Main Challenges and Research Questions

2020, Keck, Ingo R., Vidal, Maria-Esther, Heller, Lambert, Mikroyannidis, Alexander, Chang, Maiga, White, Stephen

In this article, we look at the challenges that arise in the use and management of education credentials, and from the switch from analogue, paper-based education credentials to digital education credentials. We propose a general methodology to capture qualitative descriptions and measurable quantitative results that allow to estimate the effectiveness of a digital credential management system in solving these challenges. This methodology is applied to the EU H2020 project QualiChain use case, where five pilots have been selected to study a broad field of digital credential workflows and credential management. Copyright (c) IARIA, 2020

Loading...
Thumbnail Image
Item

Preface

2019, Kaffee, Lucie-Aimee, Endris, Kemele M., Vidal, Maria-Esther, Comerio, Marco, Sadeghi, Mersedeh, Chaves-Fraga, David, Colpaert Pieter, Kaffee, Lucie Aimée, Endris, Kemele M., Vidal, María-Esther, Comerio, Marco, Sadeghi, Mersedeh, Chaves-Fraga, David, Colpaert, Pieter

This volumne presents the proceedings of the 1st International Workshop on Approaches for Making Data Interoperable (AMAR 2019) and 1st International Workshop on Semantics for Transport (Sem4Tra) held in Karlsruhe, Germany, September 9, 2019, co-located with SEMANTiCS 2019. Interoperability of data is an important factor to make transportation data accessible, therefore we present the topics alongside each other in this proceedings.

Loading...
Thumbnail Image
Item

Federated Query Processing

2020, Endris, Kemele M., Vidal, Maria-Esther, Graux, Damien, Janev, Valentina, Graux, Damien, Jabeen, Hajira, Sallinger, Emanuel

Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area.

Loading...
Thumbnail Image
Item

Interaction Network Analysis Using Semantic Similarity Based on Translation Embeddings

2019, Manzoor Bajwa, Awais, Collarana, Diego, Vidal, Maria-Esther, Acosta, Maribel, Cudré-Mauroux, Philippe, Maleshkova, Maria, Pellegrini, Tassilo, Sack, Harald, Sure-Vetter, York

Biomedical knowledge graphs such as STITCH, SIDER, and Drugbank provide the basis for the discovery of associations between biomedical entities, e.g., interactions between drugs and targets. Link prediction is a paramount task and represents a building block for supporting knowledge discovery. Although several approaches have been proposed for effectively predicting links, the role of semantics has not been studied in depth. In this work, we tackle the problem of discovering interactions between drugs and targets, and propose SimTransE, a machine learning-based approach that solves this problem effectively. SimTransE relies on translating embeddings to model drug-target interactions and values of similarity across them. Grounded on the vectorial representation of drug-target interactions, SimTransE is able to discover novel drug-target interactions. We empirically study SimTransE using state-of-the-art benchmarks and approaches. Experimental results suggest that SimTransE is competitive with the state of the art, representing, thus, an effective alternative for knowledge discovery in the biomedical domain.

Loading...
Thumbnail Image
Item

Responsible Knowledge Management in Energy Data Ecosystems

2022, Janev, Valentina, Vidal, Maria-Esther, Pujić, Dea, Popadić, Dušan, Iglesias, Enrique, Sakor, Ahmad, Čampa, Andrej

This paper analyzes the challenges and requirements of establishing energy data ecosystems (EDEs) as data-driven infrastructures that overcome the limitations of currently fragmented energy applications. It proposes a new data- and knowledge-driven approach for management and processing. This approach aims to extend the analytics services portfolio of various energy stakeholders and achieve two-way flows of electricity and information for optimized generation, distribution, and electricity consumption. The approach is based on semantic technologies to create knowledge-based systems that will aid machines in integrating and processing resources contextually and intelligently. Thus, a paradigm shift in the energy data value chain is proposed towards transparency and the responsible management of data and knowledge exchanged by the various stakeholders of an energy data space. The approach can contribute to innovative energy management and the adoption of new business models in future energy data spaces.