Search Results

Now showing 1 - 10 of 12
  • Item
    Context-Based Entity Matching for Big Data
    (Cham : Springer, 2020) Tasnim, Mayesha; Collarana, Diego; Graux, Damien; Vidal, Maria-Esther; Janev, Valentina; Graux, Damien; Jabeen, Hajira; Sallinger, Emanuel
    In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.
  • Item
    Optimizing Federated Queries Based on the Physical Design of a Data Lake
    (Aachen : RWTH, 2020) Rohde, Philipp D.; Vidal, Maria-Esther
    The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.
  • Item
    Federated Query Processing
    (Cham : Springer, 2020) Endris, Kemele M.; Vidal, Maria-Esther; Graux, Damien; Janev, Valentina; Graux, Damien; Jabeen, Hajira; Sallinger, Emanuel
    Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area.
  • Item
    Survey on Big Data Applications
    (Cham : Springer, 2020) Janev, Valentina; Pujić, Dea; Jelić, Marko; Vidal, Maria-Esther; Janev, Valentina; Graux, Damien; Jabeen, Hajira; Sallinger, Emanuel
    The goal of this chapter is to shed light on different types of big data applications needed in various industries including healthcare, transportation, energy, banking and insurance, digital media and e-commerce, environment, safety and security, telecommunications, and manufacturing. In response to the problems of analyzing large-scale data, different tools, techniques, and technologies have bee developed and are available for experimentation. In our analysis, we focused on literature (review articles) accessible via the Elsevier ScienceDirect service and the Springer Link service from more recent years, mainly from the last two decades. For the selected industries, this chapter also discusses challenges that can be addressed and overcome using the semantic processing approaches and knowledge reasoning approaches discussed in this book.
  • Item
    Digital Transformation of Education Credential Processes and Life Cycles – A Structured Overview on Main Challenges and Research Questions
    ([Wilmington, DE, USA] : IARIA, [2020], 2020) Keck, Ingo R.; Vidal, Maria-Esther; Heller, Lambert; Mikroyannidis, Alexander; Chang, Maiga; White, Stephen
    In this article, we look at the challenges that arise in the use and management of education credentials, and from the switch from analogue, paper-based education credentials to digital education credentials. We propose a general methodology to capture qualitative descriptions and measurable quantitative results that allow to estimate the effectiveness of a digital credential management system in solving these challenges. This methodology is applied to the EU H2020 project QualiChain use case, where five pilots have been selected to study a broad field of digital credential workflows and credential management. Copyright (c) IARIA, 2020
  • Item
    Falcon 2.0: An Entity and Relation Linking Tool over Wikidata
    (New York City, NY : Association for Computing Machinery, 2020) Sakor, Ahmad; Singh, Kuldeep; Patel, Anery; Vidal, Maria-Esther
    The Natural Language Processing (NLP) community has significantly contributed to the solutions for entity and relation recognition from a natural language text, and possibly linking them to proper matches in Knowledge Graphs (KGs). Considering Wikidata as the background KG, there are still limited tools to link knowledge within the text to Wikidata. In this paper, we present Falcon 2.0, the first joint entity and relation linking tool over Wikidata. It receives a short natural language text in the English language and outputs a ranked list of entities and relations annotated with the proper candidates in Wikidata. The candidates are represented by their Internationalized Resource Identifier (IRI) in Wikidata. Falcon 2.0 resorts to the English language model for the recognition task (e.g., N-Gram tiling and N-Gram splitting), and then an optimization approach for the linking task. We have empirically studied the performance of Falcon 2.0 on Wikidata and concluded that it outperforms all the existing baselines. Falcon 2.0 is open source and can be reused by the community; all the required instructions of Falcon 2.0 are well-documented at our GitHub repository (https://github.com/SDM-TIB/falcon2.0). We also demonstrate an online API, which can be run without any technical expertise. Falcon 2.0 and its background knowledge bases are available as resources at https://labs.tib.eu/falcon/falcon2/.
  • Item
    Responsible Knowledge Management in Energy Data Ecosystems
    (Basel : MDPI, 2022) Janev, Valentina; Vidal, Maria-Esther; Pujić, Dea; Popadić, Dušan; Iglesias, Enrique; Sakor, Ahmad; Čampa, Andrej
    This paper analyzes the challenges and requirements of establishing energy data ecosystems (EDEs) as data-driven infrastructures that overcome the limitations of currently fragmented energy applications. It proposes a new data- and knowledge-driven approach for management and processing. This approach aims to extend the analytics services portfolio of various energy stakeholders and achieve two-way flows of electricity and information for optimized generation, distribution, and electricity consumption. The approach is based on semantic technologies to create knowledge-based systems that will aid machines in integrating and processing resources contextually and intelligently. Thus, a paradigm shift in the energy data value chain is proposed towards transparency and the responsible management of data and knowledge exchanged by the various stakeholders of an energy data space. The approach can contribute to innovative energy management and the adoption of new business models in future energy data spaces.
  • Item
    Calibrating mini-mental state examination scores to predict misdiagnosed dementia patients
    (Basel : MDPI, 2021) Vyas, Akhilesh; Aisopos, Fotis; Vidal, Maria-Esther; Garrard, Peter; Paliouras, George
    Mini-Mental State Examination (MMSE) is used as a diagnostic test for dementia to screen a patient’s cognitive assessment and disease severity. However, these examinations are often inaccurate and unreliable either due to human error or due to patients’ physical disability to correctly interpret the questions as well as motor deficit. Erroneous data may lead to a wrong assessment of a specific patient. Therefore, other clinical factors (e.g., gender and comorbidities) existing in electronic health records, can also play a significant role, while reporting her examination results. This work considers various clinical attributes of dementia patients to accurately determine their cognitive status in terms of the Mini-Mental State Examination (MMSE) Score. We employ machine learning models to calibrate MMSE score and classify the correctness of diagnosis among patients, in order to assist clinicians in a better understanding of the progression of cognitive impairment and subsequent treatment. For this purpose, we utilize a curated real-world ageing study data. A random forest prediction model is employed to estimate the Mini-Mental State Examination score, related to the diagnostic classification of patients.This model uses various clinical attributes to provide accurate MMSE predictions, succeeding in correcting an important percentage of cases that contain previously identified miscalculated scores in our dataset. Furthermore, we provide an effective classification mechanism for automatically identifying patient episodes with inaccurate MMSE values with high confidence. These tools can be combined to assist clinicians in automatically finding episodes within patient medical records where the MMSE score is probably miscalculated and estimating what the correct value should be. This provides valuable support in the decision making process for diagnosing potential dementia patients.
  • Item
    Resorting to Context-Aware Background Knowledge for Unveiling Semantically Related Social Media Posts
    (New York, NY : IEEE, 2022) Sakor, Ahmad; Singh, Kuldeep; Vidal, Maria-Esther
    Social media networks have become a prime source for sharing news, opinions, and research accomplishments in various domains, and hundreds of millions of posts are announced daily. Given this wealth of information in social media, finding related announcements has become a relevant task, particularly in trending news (e.g., COVID-19 or lung cancer). To facilitate the search of connected posts, social networks enable users to annotate their posts, e.g., with hashtags in tweets. Albeit effective, an annotation-based search is limited because results will only include the posts that share the same annotations. This paper focuses on retrieving context-related posts based on a specific topic, and presents PINYON, a knowledge-driven framework, that retrieves associated posts effectively. PINYON implements a two-fold pipeline. First, it encodes, in a graph, a CORPUS of posts and an input post; posts are annotated with entities for existing knowledge graphs and connected based on the similarity of their entities. In a decoding phase, the encoded graph is used to discover communities of related posts. We cast this problem into the Vertex Coloring Problem, where communities of similar posts include the posts annotated with entities colored with the same colors. Built on results reported in the graph theory, PINYON implements the decoding phase guided by a heuristic-based method that determines relatedness among posts based on contextual knowledge, and efficiently groups the most similar posts in the same communities. PINYON is empirically evaluated on various datasets and compared with state-of-the-art implementations of the decoding phase. The quality of the generated communities is also analyzed based on multiple metrics. The observed outcomes indicate that PINYON accurately identifies semantically related posts in different contexts. Moreover, the reported results put in perspective the impact of known properties about the optimality of existing heuristics for vertex graph coloring and their implications on PINYON scalability.
  • Item
    Preface of MEPDaW 2020: Managing the evolution and preservation of the data web
    (Aachen, Germany : RWTH Aachen, 2020) Orlandi, Fabrizio; Graux, Damien; Vidal, Maria-Esther; Fernández, Javier D.; Debattista, Jeremy; Orlandi, Fabrizio; Graux, Damien; Vidal, Maria-Esther; Fernández, Javier D.; Debattista, Jeremy
    The MEPDaW workshop series targets one of the emerging and fundamental problems of the Web, specifically the management and preservation of evolving knowledge graphs. During the past six years, the workshop series has been gathering a community of researchers and practitioners around these challenges. To date, the series has successfully published more than 30 articles allowing more than 50 individual authors to present and share their ideas. This 6th edition, virtually co-located with the International Semantic Web Conference (ISWC 2020), gathered the community around nine research publications and one invited keynote presentation. The event took place online on the 1st of November, 2020.