Search Results

Now showing 1 - 8 of 8
Loading...
Thumbnail Image
Item

Personalised information spaces for chemical digital libraries

2009, Koepler, O., Balke, W.-T., Köncke, B., Tönnies, S.

[No abstract available]

Loading...
Thumbnail Image
Item

Efficient retrieval of 3D building models using embeddings of attributed subgraphs

2011, Wessel, R., Ochmann, S., Vock, R., Blümel, Ina, Klein, R.

We present a novel method for retrieval and classification of 3D building models that is tailored to the specific requirements of architects. In contrast to common approaches our algorithm relies on the interior spatial arrangement of rooms instead of exterior geometric shape. We first represent the internal topological building structure by a Room Connectivity Graph (RCG). Each room is characterized by a node. Connections between rooms like e.g. doors are represented by edges. Nodes and edges are additionally assigned attributes reflecting room and edge properties like e.g area or window size. To enable fast and efficient retrieval and classification with RCGs, we transform the structured graph representation into a vector-based one. We first decompose the RCG into a set of subgraphs. For each subgraph, we compute the similarity to a set of codebook graphs. Aggregating all similarity values finally provides us with a single vector for each RCG which enables fast retrieval and classification. For evaluation, we introduce a classification scheme that was carefully developed following common guidelines in architecture.We finally provide comprehensive experiments showing that the introduced subgraph embeddings yield superior performance compared to state-of-the-art graph retrieval approaches.

Loading...
Thumbnail Image
Item

Why reinvent the wheel: Let's build question answering systems together

2018, Singh, K., Radhakrishna, A.S., Both, A., Shekarpour, S., Lytra, I., Usbeck, R., Vyas, A., Khikmatullaev, A., Punjani, D., Lange, C., Vidal, Maria-Esther, Lehmann, J., Auer, Sören

Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.

Loading...
Thumbnail Image
Item

Survey vs Scraped Data: Comparing Time Series Properties of Web and Survey Vacancy Data

2019, De Pedraza, P., Visintin, S., Tijdens, K., Kismihók, G.

This paper studies the relationship between a vacancy population obtained from web crawling and vacancies in the economy inferred by a National Statistics Office (NSO) using a traditional method. We compare the time series properties of samples obtained between 2007 and 2014 by Statistics Netherlands and by a web scraping company. We find that the web and NSO vacancy data present similar time series properties, suggesting that both time series are generated by the same underlying phenomenon: the real number of new vacancies in the economy. We conclude that, in our case study, web-sourced data are able to capture aggregate economic activity in the labor market.

Loading...
Thumbnail Image
Item

Bias in data-driven artificial intelligence systems - An introductory survey

2020, Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, Maria-Esther, Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder-Kurlanda, K., Wagner, C., Karimi, F., Fernandez, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., Broelemann, K., Kasneci, G., Tiropanis, T., Staab, S.

Artificial Intelligence (AI)-based systems are widely employed nowadays to make decisions that have far-reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multidisciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful machine learning algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features such as race, sex, and so forth. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Commercial, Legal, and Ethical Issues > Ethical Considerations Commercial, Legal, and Ethical Issues > Legal Issues.

Loading...
Thumbnail Image
Item

RADAR-Team stellt Testsystem auf zweitem Projekt-Workshop in Frankfurt vor

2015, Potthoff, Jan, Razum, Matthias, Kraft, Angelina

Im Rahmen des Projekts "Research Data Repository" (RADAR) wurde am 23. Juni 2015 auf dem zweiten Projekt-Workshop der aktuelle Stand des Testsystems, das zur Archivierung und Publikation von Forschungsdaten genutzt werden kann, vorgestellt. Außerdem wurden weitere Anforderungen an das System und allgemeine Fragen des Forschungsdatenmanagements mit den Workshop-Teilnehmern diskutiert.

Loading...
Thumbnail Image
Item

The quest for research information

2014, Blümel, Ina, Dietze, Stefan, Heller, Lambert, Jäschke, Robert, Mehlberg, Martin

Research information, i.e., data about research projects, organisations, researchers or research outputs such as publications or patents, is spread across the web, usually residing in institutional and personal web pages or in semi-open databases and information systems. While there exists a wealth of unstructured information, structured data is limited and often exposed following proprietary or less-established schemas and interfaces. Therefore, a holistic and consistent view on research information across organisational and national boundaries is not feasible. On the other hand, web crawling and information extraction techniques have matured throughout the last decade, allowing for automated approaches of harvesting, extracting and consolidating research information into a more coherent knowledge graph. In this work, we give an overview of the current state of the art in research information sharing on the web and present initial ideas towards a more holistic approach for boot-strapping research information from available web sources.

Loading...
Thumbnail Image
Item

When humans and machines collaborate: Cross-lingual Label Editing in Wikidata

2019, Kaffee, L.-A., Endris, K.M., Simperl, E.

The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data.