Estimating the information gap between textual and visual representations

Henning, Christian; Ewerth, Ralph

doi:https://doi.org/10.34657/3496

Estimating the information gap between textual and visual representations

dc.bibliographicCitation.bookTitle	ICMR '17 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Bucharest, Romania — June 06 - 09, 2017 , Page 14-22	eng
dc.contributor.author	Henning, Christian
dc.contributor.author	Ewerth, Ralph
dc.date.accessioned	2018-01-30T06:44:02Z
dc.date.available	2019-06-28T13:17:25Z
dc.date.issued	2017
dc.description.abstract	Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific pub- lications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general il- lustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be de- scribed and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe cross- modal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.	eng
dc.description.version	publishedVersion	eng
dc.format	application/pdf
dc.identifier.uri	https://doi.org/10.34657/3496
dc.identifier.uri	https://oa.tib.eu/renate/handle/123456789/4432
dc.language.iso	eng	eng
dc.publisher	New York City : Association for Computing Machinery	eng
dc.relation.doi	https://doi.org/10.1145/3078971.3078991
dc.rights.license	This document may be downloaded, read, stored and printed for your own use within the limits of § 53 UrhG but it may not be distributed via the internet or passed on to external parties.	eng
dc.rights.license	Dieses Dokument darf im Rahmen von § 53 UrhG zum eigenen Gebrauch kostenfrei heruntergeladen, gelesen, gespeichert und ausgedruckt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.	ger
dc.subject.ddc	020	eng
dc.subject.other	Text-image relations	eng
dc.subject.other	multimodal embeddings	eng
dc.subject.other	deep learning	eng
dc.title	Estimating the information gap between textual and visual representations	eng
dc.type	ConferenceObject	eng
dc.type	Text	eng
tib.accessRights	openAccess	eng
wgl.contributor	TIB	eng
wgl.subject	Informatik	eng
wgl.type	Konferenzbeitrag	eng

Files

Original bundle

Now showing 1 - 1 of 1

Name:: p14-henning.pdf
Size:: 5.59 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Informationswissenschaften