Search Results

Now showing 1 - 3 of 3
  • Item
    Understanding image-text relations and news values for multimodal news analysis
    (Lausanne : Frontiers Media, 2023) Cheema, Gullal S.; Hakimov, Sherzod; Müller-Budack, Eric; Otto, Christian; Bateman, John A.; Ewerth, Ralph
    The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.
  • Item
    The Search as Learning Spaceship: Toward a Comprehensive Model of Psychological and Technological Facets of Search as Learning
    (Lausanne : Frontiers Research Foundation, 2022) von Hoyer, Johannes; Hoppe, Anett; Kammerer, Yvonne; Otto, Christian; Pardi, Georg; Rokicki, Markus; Yu, Ran; Dietze, Stefan; Ewerth, Ralph; Holtz, Peter
    Using a Web search engine is one of today’s most frequent activities. Exploratory search activities which are carried out in order to gain knowledge are conceptualized and denoted as Search as Learning (SAL). In this paper, we introduce a novel framework model which incorporates the perspective of both psychology and computer science to describe the search as learning process by reviewing recent literature. The main entities of the model are the learner who is surrounded by a specific learning context, the interface that mediates between the learner and the information environment, the information retrieval (IR) backend which manages the processes between the interface and the set of Web resources, that is, the collective Web knowledge represented in resources of different modalities. At first, we provide an overview of the current state of the art with regard to the five main entities of our model, before we outline areas of future research to improve our understanding of search as learning processes.
  • Item
    Semi-supervised identification of rarely appearing persons in video by correcting weak labels
    (New York City : Association for Computing Machinery, 2016) Müller, Eric; Otto, Christian; Ewerth, Ralph
    Some recent approaches for character identification in movies and TV broadcasts are realized in a semi-supervised manner by assigning transcripts and/or subtitles to the speakers. However, the labels obtained in this way achieve only an accuracy of 80% - 90% and the number of training examples for the different actors is unevenly distributed. In this paper, we propose a novel approach for person identification in video by correcting and extending the training data with reliable predictions to reduce the number of annotation errors. Furthermore, the intra-class diversity of rarely speaking characters is enhanced. To address the imbalance of training data per person, we suggest two complementary prediction scores. These scores are also used to recognize whether or not a face track belongs to a (supporting) character whose identity does not appear in the transcript etc. Experimental results demonstrate the feasibility of the proposed approach, outperforming the current state of the art.