Search Results

Now showing 1 - 8 of 8
Loading...
Thumbnail Image
Item

Compact representations for efficient storage of semantic sensor data

2021, Karim, Farah, Vidal, Maria-Esther, Auer, Sören

Nowadays, there is a rapid increase in the number of sensor data generated by a wide variety of sensors and devices. Data semantics facilitate information exchange, adaptability, and interoperability among several sensors and devices. Sensor data and their meaning can be described using ontologies, e.g., the Semantic Sensor Network (SSN) Ontology. Notwithstanding, semantically enriched, the size of semantic sensor data is substantially larger than raw sensor data. Moreover, some measurement values can be observed by sensors several times, and a huge number of repeated facts about sensor data can be produced. We propose a compact or factorized representation of semantic sensor data, where repeated measurement values are described only once. Furthermore, these compact representations are able to enhance the storage and processing of semantic sensor data. To scale up to large datasets, factorization based, tabular representations are exploited to store and manage factorized semantic sensor data using Big Data technologies. We empirically study the effectiveness of a semantic sensor’s proposed compact representations and their impact on query processing. Additionally, we evaluate the effects of storing the proposed representations on diverse RDF implementations. Results suggest that the proposed compact representations empower the storage and query processing of sensor data over diverse RDF implementations, and up to two orders of magnitude can reduce query execution time.

Loading...
Thumbnail Image
Item

Compacting frequent star patterns in RDF graphs

2020, Karim, Farah, Vidal, Maria-Esther, Auer, Sören

Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. Since the edges between the entities and the objects in the frequent star pattern are replaced by edges between these entities and the surrogate entity of the compact RDF molecule, the size of the RDF graph is reduced. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude. Additionally, RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved.

Loading...
Thumbnail Image
Item

Encoding Knowledge Graph Entity Aliases in Attentive Neural Network for Wikidata Entity Linking

2020, Mulang’, Isaiah Onando, Singh, Kuldeep, Vyas, Akhilesh, Shekarpour, Saeedeh, Vidal, Maria-Esther, Lehmann, Jens, Auer, Sören, Huang, Zhisheng, Beek, Wouter, Wang, Hua, Zhou, Rui, Zhang, Yanchun

The collaborative knowledge graphs such as Wikidata excessively rely on the crowd to author the information. Since the crowd is not bound to a standard protocol for assigning entity titles, the knowledge graph is populated by non-standard, noisy, long or even sometimes awkward titles. The issue of long, implicit, and nonstandard entity representations is a challenge in Entity Linking (EL) approaches for gaining high precision and recall. Underlying KG in general is the source of target entities for EL approaches, however, it often contains other relevant information, such as aliases of entities (e.g., Obama and Barack Hussein Obama are aliases for the entity Barack Obama). EL models usually ignore such readily available entity attributes. In this paper, we examine the role of knowledge graph context on an attentive neural network approach for entity linking on Wikidata. Our approach contributes by exploiting the sufficient context from a KG as a source of background knowledge, which is then fed into the neural network. This approach demonstrates merit to address challenges associated with entity titles (multi-word, long, implicit, case-sensitive). Our experimental study shows ≈8% improvements over the baseline approach, and significantly outperform an end to end approach for Wikidata entity linking.

Loading...
Thumbnail Image
Item

OpenBudgets.eu: A platform for semantically representing and analyzing open fiscal data

2018, Musyaffa, Fathoni A., Halilaj, Lavdim, Li, Yakun, Orlandi, Fabrizio, Jabeen, Hajira, Auer, Sören, Vidal, Maria-Esther

A paper describing the details of OpenBudgets.eu platform implementation. Pre-print version of the paper accepted at International Conference On Web Engineering (ICWE) 2018 in Caceres, Spain.

Loading...
Thumbnail Image
Item

Towards an Open Research Knowledge Graph

2018, Auer, Sören, Blümel, Ina, Ewerth, Ralph, Garatzogianni, Alexandra, Heller,, Lambert, Hoppe, Anett, Kasprzik, Anna, Koepler, Oliver, Nejdl, Wolfgang, Plank, Margret, Sens, Irina, Stocker, Markus, Tullney, Marco, Vidal, Maria-Esther, van Wezenbeek, Wilma

The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Despite an improved and digital access to scientific publications in the last decades, the exchange of scholarly knowledge continues to be primarily document-based: Researchers produce essays and articles that are made available in online and offline publication media as roughly granular text documents. With current developments in areas such as knowledge representation, semantic search, human-machine interaction, natural language processing, and artificial intelligence, it is possible to completely rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the distributed, decentralized, collaborative creation and evolution of information models, vocabularies, ontologies, and knowledge graphs for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This revolutionizes scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. As a result, scientific work becomes more effective and efficient, since results become directly comparable and easier to reuse. In order to realize the vision of knowledge-based information flows in scholarly communication, comprehensive long-term technological infrastructure development and accompanying research are required. To secure information sovereignty, it is also of paramount importance to science – and urgency to science policymakers – that scientific infrastructures establish an open counterweight to emerging commercial developments in this area. The aim of this position paper is to facilitate the discussion on requirements, design decisions and a minimum viable product for an Open Research Knowledge Graph infrastructure. TIB aims to start developing this infrastructure in an open collaboration with interested partner organizations and individuals.

Loading...
Thumbnail Image
Item

Formalizing Gremlin pattern matching traversals in an integrated graph Algebra

2019, Thakkar, Harsh, Auer, Sören, Vidal, Maria-Esther, Samavi, Reza, Consens, Mariano P., Khatchadourian, Shahan, Nguyen, Vinh, Sheth, Amit, Giménez-García, José M., Thakkar, Harsh

Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ-ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex-ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys-tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match-ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.

Loading...
Thumbnail Image
Item

Why reinvent the wheel: Let's build question answering systems together

2018, Singh, K., Radhakrishna, A.S., Both, A., Shekarpour, S., Lytra, I., Usbeck, R., Vyas, A., Khikmatullaev, A., Punjani, D., Lange, C., Vidal, Maria-Esther, Lehmann, J., Auer, Sören

Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.

Loading...
Thumbnail Image
Item

Experience: Open fiscal datasets, common issues, and recommendations

2018, Musyaffa, Fathoni A., Engels, Christiane, Vidal, Maria-Esther, Orlandi, Fabrizio, Auer, Sören

A pre-print paper detailing recommendation for publishing fiscal data, including assessment framework for fiscal datasets. This paper has been accepted at ACM Journal of Data and Information Quality (JDIQ) in 2018.