Browsing by Author "Vidal, Maria-Esther"
Now showing 1 - 20 of 29
Results Per Page
- ItemBias in data-driven artificial intelligence systems - An introductory survey(Hoboken, NJ : Wiley-Blackwell, 2020) Ntoutsi, E.; Fafalios, P.; Gadiraju, U.; Iosifidis, V.; Nejdl, W.; Vidal, Maria-Esther; Ruggieri, S.; Turini, F.; Papadopoulos, S.; Krasanakis, E.; Kompatsiaris, I.; Kinder-Kurlanda, K.; Wagner, C.; Karimi, F.; Fernandez, M.; Alani, H.; Berendt, B.; Kruegel, T.; Heinze, C.; Broelemann, K.; Kasneci, G.; Tiropanis, T.; Staab, S.Artificial Intelligence (AI)-based systems are widely employed nowadays to make decisions that have far-reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multidisciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful machine learning algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features such as race, sex, and so forth. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Commercial, Legal, and Ethical Issues > Ethical Considerations Commercial, Legal, and Ethical Issues > Legal Issues.
- ItemCalibrating mini-mental state examination scores to predict misdiagnosed dementia patients(Basel : MDPI, 2021) Vyas, Akhilesh; Aisopos, Fotis; Vidal, Maria-Esther; Garrard, Peter; Paliouras, GeorgeMini-Mental State Examination (MMSE) is used as a diagnostic test for dementia to screen a patient’s cognitive assessment and disease severity. However, these examinations are often inaccurate and unreliable either due to human error or due to patients’ physical disability to correctly interpret the questions as well as motor deficit. Erroneous data may lead to a wrong assessment of a specific patient. Therefore, other clinical factors (e.g., gender and comorbidities) existing in electronic health records, can also play a significant role, while reporting her examination results. This work considers various clinical attributes of dementia patients to accurately determine their cognitive status in terms of the Mini-Mental State Examination (MMSE) Score. We employ machine learning models to calibrate MMSE score and classify the correctness of diagnosis among patients, in order to assist clinicians in a better understanding of the progression of cognitive impairment and subsequent treatment. For this purpose, we utilize a curated real-world ageing study data. A random forest prediction model is employed to estimate the Mini-Mental State Examination score, related to the diagnostic classification of patients.This model uses various clinical attributes to provide accurate MMSE predictions, succeeding in correcting an important percentage of cases that contain previously identified miscalculated scores in our dataset. Furthermore, we provide an effective classification mechanism for automatically identifying patient episodes with inaccurate MMSE values with high confidence. These tools can be combined to assist clinicians in automatically finding episodes within patient medical records where the MMSE score is probably miscalculated and estimating what the correct value should be. This provides valuable support in the decision making process for diagnosing potential dementia patients.
- ItemClassifying data heterogeneity within budget and spending open data(Zenodo, 2018) Musyaffa, Fathoni A.; Orlandi, Fabrizio; Jabeen, Hajira; Vidal, Maria-EstherAfter a thorough analysis of several budgets and spending datasets, we classified several types of heterogeneities among budget and spending datasets. Pre-print version of the paper accepted at International Conferences on Theory and Practice of Electronic Governance (ICEGOV) 2018 in Galway, Ireland.
- ItemCompact representations for efficient storage of semantic sensor data(Dordrecht : Springer Science + Business Media B.V, 2021) Karim, Farah; Vidal, Maria-Esther; Auer, SörenNowadays, there is a rapid increase in the number of sensor data generated by a wide variety of sensors and devices. Data semantics facilitate information exchange, adaptability, and interoperability among several sensors and devices. Sensor data and their meaning can be described using ontologies, e.g., the Semantic Sensor Network (SSN) Ontology. Notwithstanding, semantically enriched, the size of semantic sensor data is substantially larger than raw sensor data. Moreover, some measurement values can be observed by sensors several times, and a huge number of repeated facts about sensor data can be produced. We propose a compact or factorized representation of semantic sensor data, where repeated measurement values are described only once. Furthermore, these compact representations are able to enhance the storage and processing of semantic sensor data. To scale up to large datasets, factorization based, tabular representations are exploited to store and manage factorized semantic sensor data using Big Data technologies. We empirically study the effectiveness of a semantic sensor’s proposed compact representations and their impact on query processing. Additionally, we evaluate the effects of storing the proposed representations on diverse RDF implementations. Results suggest that the proposed compact representations empower the storage and query processing of sensor data over diverse RDF implementations, and up to two orders of magnitude can reduce query execution time.
- ItemCompacting frequent star patterns in RDF graphs(Dordrecht : Springer Science + Business Media B.V, 2020) Karim, Farah; Vidal, Maria-Esther; Auer, SörenKnowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. Since the edges between the entities and the objects in the frequent star pattern are replaced by edges between these entities and the surrogate entity of the compact RDF molecule, the size of the RDF graph is reduced. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude. Additionally, RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved.
- ItemContext-Based Entity Matching for Big Data(Cham : Springer, 2020) Tasnim, Mayesha; Collarana, Diego; Graux, Damien; Vidal, Maria-Esther; Janev, Valentina; Graux, Damien; Jabeen, Hajira; Sallinger, EmanuelIn the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.
- ItemCreating and Capturing Artificial Emotions in Autonomous Robots and Software Agents(Cham : Springer, 2020) Hoffmann, Claus; Vidal, Maria-Esther; Bielikova, Maria; Mikkonen, Tommi; Pautasso, CesareThis paper presents ARTEMIS, a control system for autonomous robots or software agents. ARTEMIS is able to create and capture artificial emotions during interactions with its environment, and we describe the underlying mechanisms for this. The control system also realizes the capturing of knowledge about its past artificial emotions. A specific interpretation of a knowledge graph, called an Agent Knowledge Graph, represents these artificial emotions. For this, we devise a formalism which enriches the traditional factual knowledge in knowledge graphs with the representation of artificial emotions. As proof of concept, we realize a concrete software agent based on the ARTEMIS control system. This software agent acts as a user assistant and executes the user’s orders. The environment of this user assistant consists of autonomous service agents. The execution of user’s orders requires interaction with these autonomous service agents. These interactions lead to artificial emotions within the assistant. The first experiments show that it is possible to realize an autonomous agent with plausible artificial emotions with ARTEMIS and to record these artificial emotions in its Agent Knowledge Graph. In this way, autonomous agents based on ARTEMIS can capture essential knowledge that supports successful planning and decision making in complex dynamic environments and surpass emotionless agents.
- ItemA Data-Driven Approach for Analyzing Healthcare Services Extracted from Clinical Records(Piscataway, NJ : IEEE, 2020) Scurti, Manuel; Menasalvas-Ruiz, Ernestina; Vidal, Maria-Esther; Torrente, Maria; Vogiatzis, Dimitrios; Paliouras, George; Provencio, Mariano; Rodríguez-González, Alejandro; Seco de Herrera, Alba García; Rodríguez González, Alejandro; Santosh, K.C.; Temesgen, Zelalem; Soda, PaoloCancer remains one of the major public health challenges worldwide. After cardiovascular diseases, cancer is one of the first causes of death and morbidity in Europe, with more than 4 million new cases and 1.9 million deaths per year. The suboptimal management of cancer patients during treatment and subsequent follows up are major obstacles in achieving better outcomes of the patients and especially regarding cost and quality of life In this paper, we present an initial data-driven approach to analyze the resources and services that are used more frequently by lung-cancer patients with the aim of identifying where the care process can be improved by paying a special attention on services before diagnosis to being able to identify possible lung-cancer patients before they are diagnosed and by reducing the length of stay in the hospital. Our approach has been built by analyzing the clinical notes of those oncological patients to extract this information and their relationships with other variables of the patient. Although the approach shown in this manuscript is very preliminary, it shows that quite interesting outcomes can be derived from further analysis. © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
- ItemEncoding Knowledge Graph Entity Aliases in Attentive Neural Network for Wikidata Entity Linking(Berlin ; Heidelberg : Springer, 2020) Mulang’, Isaiah Onando; Singh, Kuldeep; Vyas, Akhilesh; Shekarpour, Saeedeh; Vidal, Maria-Esther; Lehmann, Jens; Auer, Sören; Huang, Zhisheng; Beek, Wouter; Wang, Hua; Zhou, Rui; Zhang, YanchunThe collaborative knowledge graphs such as Wikidata excessively rely on the crowd to author the information. Since the crowd is not bound to a standard protocol for assigning entity titles, the knowledge graph is populated by non-standard, noisy, long or even sometimes awkward titles. The issue of long, implicit, and nonstandard entity representations is a challenge in Entity Linking (EL) approaches for gaining high precision and recall. Underlying KG in general is the source of target entities for EL approaches, however, it often contains other relevant information, such as aliases of entities (e.g., Obama and Barack Hussein Obama are aliases for the entity Barack Obama). EL models usually ignore such readily available entity attributes. In this paper, we examine the role of knowledge graph context on an attentive neural network approach for entity linking on Wikidata. Our approach contributes by exploiting the sufficient context from a KG as a source of background knowledge, which is then fed into the neural network. This approach demonstrates merit to address challenges associated with entity titles (multi-word, long, implicit, case-sensitive). Our experimental study shows ≈8% improvements over the baseline approach, and significantly outperform an end to end approach for Wikidata entity linking.
- ItemEnhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV(Amsterdam : IOS Press, 2020) Chaves-Fraga, David; Ruckhaus, Edna; Priyatna, Freddy; Vidal, Maria-Esther; Corchio, OscarOntology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets, either by materializing integrated data into RDF or by performing on-the fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented; thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with a benchmark using the GTFS dataset from the Madrid subway; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of MorphCSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.
- ItemExperience: Open fiscal datasets, common issues, and recommendations(Zenodo, 2018) Musyaffa, Fathoni A.; Engels, Christiane; Vidal, Maria-Esther; Orlandi, Fabrizio; Auer, SörenA pre-print paper detailing recommendation for publishing fiscal data, including assessment framework for fiscal datasets. This paper has been accepted at ACM Journal of Data and Information Quality (JDIQ) in 2018.
- ItemFalcon 2.0: An Entity and Relation Linking Tool over Wikidata(New York City, NY : Association for Computing Machinery, 2020) Sakor, Ahmad; Singh, Kuldeep; Patel, Anery; Vidal, Maria-EstherThe Natural Language Processing (NLP) community has significantly contributed to the solutions for entity and relation recognition from a natural language text, and possibly linking them to proper matches in Knowledge Graphs (KGs). Considering Wikidata as the background KG, there are still limited tools to link knowledge within the text to Wikidata. In this paper, we present Falcon 2.0, the first joint entity and relation linking tool over Wikidata. It receives a short natural language text in the English language and outputs a ranked list of entities and relations annotated with the proper candidates in Wikidata. The candidates are represented by their Internationalized Resource Identifier (IRI) in Wikidata. Falcon 2.0 resorts to the English language model for the recognition task (e.g., N-Gram tiling and N-Gram splitting), and then an optimization approach for the linking task. We have empirically studied the performance of Falcon 2.0 on Wikidata and concluded that it outperforms all the existing baselines. Falcon 2.0 is open source and can be reused by the community; all the required instructions of Falcon 2.0 are well-documented at our GitHub repository (https://github.com/SDM-TIB/falcon2.0). We also demonstrate an online API, which can be run without any technical expertise. Falcon 2.0 and its background knowledge bases are available as resources at https://labs.tib.eu/falcon/falcon2/.
- ItemFederated Query Processing(Cham : Springer, 2020) Endris, Kemele M.; Vidal, Maria-Esther; Graux, Damien; Janev, Valentina; Graux, Damien; Jabeen, Hajira; Sallinger, EmanuelBig data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area.
- ItemFormalizing Gremlin pattern matching traversals in an integrated graph Algebra(Aachen, Germany : RWTH Aachen, 2019) Thakkar, Harsh; Auer, Sören; Vidal, Maria-Esther; Samavi, Reza; Consens, Mariano P.; Khatchadourian, Shahan; Nguyen, Vinh; Sheth, Amit; Giménez-García, José M.; Thakkar, HarshGraph data management (also called NoSQL) has revealed beneﬁcial characteristics in terms of ﬂexibility and scalability by diﬀer-ently balancing between query expressivity and schema ﬂexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-speciﬁc graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards ﬂex-ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys-tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match-ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.
- ItemFunMap: Efficient Execution of Functional Mappings for Knowledge Graph Creation(Cham : Springer, 2020) Jozashoori, Samaneh; Chaves-Fraga, David; Iglesias, Enrique; Vidal, Maria-Esther; Corcho, Oscar; Pan, Jeff Z.; Tamma, Valentina; d'Amato, Claudia; Janowicz, Kryztof; Fu, Bo; Polleres, Axel; Seneviratne, Oshani; Kagal, LalanaData has exponentially grown in the last years, and knowledge graphs constitute powerful formalisms to integrate a myriad of existing data sources. Transformation functions – specified with function-based mapping languages like FunUL and RML+FnO – can be applied to overcome interoperability issues across heterogeneous data sources. However, the absence of engines to efficiently execute these mapping languages hinders their global adoption. We propose FunMap, an interpreter of function-based mapping languages; it relies on a set of lossless rewriting rules to push down and materialize the execution of functions in initial steps of knowledge graph creation. Although applicable to any function-based mapping language that supports joins between mapping rules, FunMap feasibility is shown on RML+FnO. FunMap reduces data redundancy, e.g., duplicates and unused attributes, and converts RML+FnO mappings into a set of equivalent rules executable on RML-compliant engines. We evaluate FunMap performance over real-world testbeds from the biomedical domain. The results indicate that FunMap reduces the execution time of RML-compliant engines by up to a factor of 18, furnishing, thus, a scalable solution for knowledge graph creation.
- ItemIdentifying the presence and severity of dementia by applying interpretable machine learning techniques on structured clinical records(London : BioMed Central, 2022) Vyas, Akhilesh; Aisopos, Fotis; Vidal, Maria-Esther; Garrard, Peter; Paliouras, GeorgiosBackground: Dementia develops as cognitive abilities deteriorate, and early detection is critical for effective preventive interventions. However, mainstream diagnostic tests and screening tools, such as CAMCOG and MMSE, often fail to detect dementia accurately. Various graph-based or feature-dependent prediction and progression models have been proposed. Whenever these models exploit information in the patients’ Electronic Medical Records, they represent promising options to identify the presence and severity of dementia more precisely. Methods: The methods presented in this paper aim to address two problems related to dementia: (a) Basic diagnosis: identifying the presence of dementia in individuals, and (b) Severity diagnosis: predicting the presence of dementia, as well as the severity of the disease. We formulate these two tasks as classification problems and address them using machine learning models based on random forests and decision tree, analysing structured clinical data from an elderly population cohort. We perform a hybrid data curation strategy in which a dementia expert is involved to verify that curation decisions are meaningful. We then employ the machine learning algorithms that classify individual episodes into a specific dementia class. Decision trees are also used for enhancing the explainability of decisions made by prediction models, allowing medical experts to identify the most crucial patient features and their threshold values for the classification of dementia. Results: Our experiment results prove that baseline arithmetic or cognitive tests, along with demographic features, can predict dementia and its severity with high accuracy. In specific, our prediction models have reached an average f1-score of 0.93 and 0.81 for problems (a) and (b), respectively. Moreover, the decision trees produced for the two issues empower the interpretability of the prediction models. Conclusions: This study proves that there can be an accurate estimation of the existence and severity of dementia disease by analysing various electronic medical record features and cognitive tests from the episodes of the elderly population. Moreover, a set of decision rules may comprise the building blocks for an efficient patient classification. Relevant clinical and screening test features (e.g. simple arithmetic or animal fluency tasks) represent precise predictors without calculating the scores of mainstream cognitive tests such as MMSE and CAMCOG. Such predictive model can identify not only meaningful features, but also justifications of classification. As a result, the predictive power of machine learning models over curated clinical data is proved, paving the path for a more accurate diagnosis of dementia.
- ItemInteraction Network Analysis Using Semantic Similarity Based on Translation Embeddings(Berlin ; Heidelberg : Springer, 2019) Manzoor Bajwa, Awais; Collarana, Diego; Vidal, Maria-Esther; Acosta, Maribel; Cudré-Mauroux, Philippe; Maleshkova, Maria; Pellegrini, Tassilo; Sack, Harald; Sure-Vetter, YorkBiomedical knowledge graphs such as STITCH, SIDER, and Drugbank provide the basis for the discovery of associations between biomedical entities, e.g., interactions between drugs and targets. Link prediction is a paramount task and represents a building block for supporting knowledge discovery. Although several approaches have been proposed for effectively predicting links, the role of semantics has not been studied in depth. In this work, we tackle the problem of discovering interactions between drugs and targets, and propose SimTransE, a machine learning-based approach that solves this problem effectively. SimTransE relies on translating embeddings to model drug-target interactions and values of similarity across them. Grounded on the vectorial representation of drug-target interactions, SimTransE is able to discover novel drug-target interactions. We empirically study SimTransE using state-of-the-art benchmarks and approaches. Experimental results suggest that SimTransE is competitive with the state of the art, representing, thus, an effective alternative for knowledge discovery in the biomedical domain.
- ItemA Knowledge Graph for Industry 4.0(Cham : Springer, 2020) Bader, Sebastian R.; Grangel-Gonzalez, Irlan; Nanjappa, Priyanka; Vidal, Maria-Esther; Maleshkova, Maria; Harth, Andreas; Kirrane, Sabrina; Ngonga Ngomo, Axel-Cyrille; Paulheim, Heiko; Rula, Anisa; Gentile, Anna Lisa; Haase, Peter; Cochez, MichaelOne of the most crucial tasks for today’s knowledge workers is to get and retain a thorough overview on the latest state of the art. Especially in dynamic and evolving domains, the amount of relevant sources is constantly increasing, updating and overruling previous methods and approaches. For instance, the digital transformation of manufacturing systems, called Industry 4.0, currently faces an overwhelming amount of standardization efforts and reference initiatives, resulting in a sophisticated information environment. We propose a structured dataset in the form of a semantically annotated knowledge graph for Industry 4.0 related standards, norms and reference frameworks. The graph provides a Linked Data-conform collection of annotated, classified reference guidelines supporting newcomers and experts alike in understanding how to implement Industry 4.0 systems. We illustrate the suitability of the graph for various use cases, its already existing applications, present the maintenance process and evaluate its quality.
- ItemOpenBudgets.eu: A platform for semantically representing and analyzing open fiscal data(Zenodo, 2018) Musyaffa, Fathoni A.; Halilaj, Lavdim; Li, Yakun; Orlandi, Fabrizio; Jabeen, Hajira; Auer, Sören; Vidal, Maria-EstherA paper describing the details of OpenBudgets.eu platform implementation. Pre-print version of the paper accepted at International Conference On Web Engineering (ICWE) 2018 in Caceres, Spain.
- ItemOptimizing Federated Queries Based on the Physical Design of a Data Lake(Aachen : RWTH, 2020) Rohde, Philipp D.; Vidal, Maria-EstherThe optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.