Search Results

Now showing 1 - 8 of 8
  • Item
    Why reinvent the wheel: Let's build question answering systems together
    (New York City : Association for Computing Machinery, 2018) Singh, K.; Radhakrishna, A.S.; Both, A.; Shekarpour, S.; Lytra, I.; Usbeck, R.; Vyas, A.; Khikmatullaev, A.; Punjani, D.; Lange, C.; Vidal, Maria-Esther; Lehmann, J.; Auer, Sören
    Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.
  • Item
    When humans and machines collaborate: Cross-lingual Label Editing in Wikidata
    (New York City : Association for Computing Machinery, 2019) Kaffee, L.-A.; Endris, K.M.; Simperl, E.
    The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data.
  • Item
    Estimating the information gap between textual and visual representations
    (New York City : Association for Computing Machinery, 2017) Henning, Christian; Ewerth, Ralph
    Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific pub- lications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general il- lustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be de- scribed and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe cross- modal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.
  • Item
    Soft Inkjet Circuits: Rapid Multi-Material Fabrication of Soft Circuits using a Commodity Inkjet Printer
    (New York City : Association for Computing Machinery, 2019) Khan, Arshad; Roo, Joan Sol; Kraus, Tobias; Steimle, Jürgen
    Despite the increasing popularity of soft interactive devices, their fabrication remains complex and time consuming. We contribute a process for rapid do-it-yourself fabrication of soft circuits using a conventional desktop inkjet printer. It supports inkjet printing of circuits that are stretchable, ultrathin, high resolution, and integrated with a wide variety of materials used for prototyping. We introduce multi-ink functional printing on a desktop printer for realizing multi-material devices, including conductive and isolating inks. We further present DIY techniques to enhance compatibility between inks and substrates and the circuits' elasticity. This enables circuits on a wide set of materials including temporary tattoo paper, textiles, and thermoplastic. Four application cases demonstrate versatile uses for realizing stretchable devices, e-textiles, body-based and re-shapeable interfaces.
  • Item
    On the effects of spam filtering and incremental learning for web-supervised visual concept classification
    (New York City : Association for Computing Machinery, 2016) Springstein , Matthias; Ewerth, Ralph
    Deep neural networks have been successfully applied to the task of visual concept classification. However, they require a large number of training examples for learning. Although pre-trained deep neural networks are available for some domains, they usually have to be fine-tuned for an envisaged target domain. Recently, some approaches have been suggested that are aimed at incrementally (or even endlessly) learning visual concepts based on Web data. Since tags of Web images are often noisy, normally some filtering mechanisms are employed in order to remove ``spam'' images that are not appropriate for training. In this paper, we investigate several aspects of a web-supervised system that has to be adapted to another target domain: 1.) the effect of incremental learning, 2.) the effect of spam filtering, and 3.) the behavior of particular concept classes with respect to 1.) and 2.). The experimental results provide some insights under which conditions incremental learning and spam filtering are useful.
  • Item
    Hi Doppelgänger: Towards Detecting Manipulation in News Comments
    (New York City : Association for Computing Machinery, 2019) Pennekamp, Jan; Henze, Martin; Hohlfeld, Oliver; Panchenko, Andriy
    Public opinion manipulation is a serious threat to society, potentially influencing elections and the political situation even in established democracies. The prevalence of online media and the opportunity for users to express opinions in comments magnifies the problem. Governments, organizations, and companies can exploit this situation for biasing opinions. Typically, they deploy a large number of pseudonyms to create an impression of a crowd that supports specific opinions. Side channel information (such as IP addresses or identities of browsers) often allows a reliable detection of pseudonyms managed by a single person. However, while spoofing and anonymizing data that links these accounts is simple, a linking without is very challenging. In this paper, we evaluate whether stylometric features allow a detection of such doppelgängers within comment sections on news articles. To this end, we adapt a state-of-the-art doppelgänger detector to work on small texts (such as comments) and apply it on three popular news sites in two languages. Our results reveal that detecting potential doppelgängers based on linguistics is a promising approach even when no reliable side channel information is available. Preliminary results following an application in the wild shows indications for doppelgängers in real world data sets.
  • Item
    Semi-supervised identification of rarely appearing persons in video by correcting weak labels
    (New York City : Association for Computing Machinery, 2016) Müller, Eric; Otto, Christian; Ewerth, Ralph
    Some recent approaches for character identification in movies and TV broadcasts are realized in a semi-supervised manner by assigning transcripts and/or subtitles to the speakers. However, the labels obtained in this way achieve only an accuracy of 80% - 90% and the number of training examples for the different actors is unevenly distributed. In this paper, we propose a novel approach for person identification in video by correcting and extending the training data with reliable predictions to reduce the number of annotation errors. Furthermore, the intra-class diversity of rarely speaking characters is enhanced. To address the imbalance of training data per person, we suggest two complementary prediction scores. These scores are also used to recognize whether or not a face track belongs to a (supporting) character whose identity does not appear in the transcript etc. Experimental results demonstrate the feasibility of the proposed approach, outperforming the current state of the art.
  • Item
    IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems
    (New York City : Association for Computing Machinery, 2022-10-26) Wolsing, Konrad; Wagner, Eric; Saillard, Antoine; Henze, Martin
    The increasing interconnection of industrial networks exposes them to an ever-growing risk of cyber attacks. To reveal such attacks early and prevent any damage, industrial intrusion detection searches for anomalies in otherwise predictable communication or process behavior. However, current efforts mostly focus on specific domains and protocols, leading to a research landscape broken up into isolated silos. Thus, existing approaches cannot be applied to other industries that would equally benefit from powerful detection. To better understand this issue, we survey 53 detection systems and find no fundamental reason for their narrow focus. Although they are often coupled to specific industrial protocols in practice, many approaches could generalize to new industrial scenarios in theory. To unlock this potential, we propose IPAL, our industrial protocol abstraction layer, to decouple intrusion detection from domain-specific industrial protocols. After proving IPAL's correctness in a reproducibility study of related work, we showcase its unique benefits by studying the generalizability of existing approaches to new datasets and conclude that they are indeed not restricted to specific domains or protocols and can perform outside their restricted silos.