Search Results

Now showing 1 - 10 of 55
Thumbnail Image

Estimating the information gap between textual and visual representations

2017, Henning, Christian, Ewerth, Ralph

Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific pub- lications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general il- lustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be de- scribed and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe cross- modal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach.

Thumbnail Image

A PDF Test-Set for Well-Formedness Validation in JHOVE - The Good, the Bad and the Ugly

2017, Lindlar, Michelle, Tunnat, Yvonne, Wilson, Carl

Digital preservation and active software stewardship are both cyclical processes. While digital preservation strategies have to be reevaluated regularly to ensure that they still meet technological and organizational requirements, software needs to be tested with every new release to ensure that it functions correctly. JHOVE is an open source format validation tool which plays a central role in many digital preservation workflows and the PDF module is one of its most important features. Unlike tools such as Adobe PreFlight or veraPDF which check against requirements at profile level, JHOVE’s PDF-module is the only tool that can validate the syntax and structure of PDF files. Despite JHOVE’s widespread and long-standing adoption, the underlying validation rules are not formally or thoroughly tested, leading to bugs going undetected for a long time. Furthermore, there is no ground-truth data set which can be used to understand and test PDF validation at the structural level. The authors present a corpus of light-weight files designed to test the validation criteria of JHOVE’s PDF module against “well-formedness”. We conclude by measuring the code coverage of the test corpus within JHOVE PDF validation and by feeding detected inconsistencies of the PDF-module back into the open source development process.

Thumbnail Image

Service durch Kompetenzbündelung - Das institutionelle Konzept zum Forschungsdatenmanagement der Leibniz Universität Hannover

2017, Meyer, Anneke, Neumann, Janna

Die Leibniz Universität Hannover hat den bedarfsgerechten Auf- und Ausbau des Unterstützungsangebots zum Umgang mit Forschungsdaten als strategisches Ziel definiert, um den eigenen Forschungsstandort zu stärken. Fachpersonal aus dem Dezernat Forschung, den Leibniz Universität IT Services (LUIS) und der Technischen Informationsbibliothek (TIB) haben dazu ein institutionelles Konzept entworfen, das seit Dezember 2016 umgesetzt wird. Ausgangspunkt des Konzepts bildete eine Umfrage zum Umgang mit Forschungsdaten an der Leibniz Universität Hannover, die durch qualitative Interviews ergänzt wurde. Das institutionelle Konzept umfasst folgende Elemente: Etablierung einer Policy zum Umgang mit Forschungsdaten für die gesamte Universität, Beratung und Schulung für Wissenschaftlerinnen und Wissenschaftler und die Service-Einrichtungen, Auf- und Ausbau eines institutionellen Datenrepositoriums und Entwicklung von Schnittstellen zum Forschungsinformationssystem und zum Volltextrepositorium, Universitätsübergreifende Kooperation & Vernetzung. Die vier Elemente befinden sich in einem unterschiedlichen Umsetzungsstand. Bereits seit 2014 führen die beteiligten Institutionen gemeinsam Beratungen und Schulungen durch und nutzen dafür zur Qualitätssicherung und gegenseitigen Information gemeinsame Dokumentationssysteme. In diesem Bereich konnten in den letzten zwei Jahre Erfahrungen gesammelt werden und Prozesse entsprechend optimiert werden. Die Herausforderung des Ansatzes an der Leibniz Universität besteht darin, ein einrichtungsübergreifendes Service-Angebot vorzuhalten und kollaborativ weiter zu entwickeln. Dadurch ist gewährleistet, dass Kompetenzen effektiv gebündelt werden und sich keine Parallelstrukturen an einzelnen Einrichtungen bilden. Durch die gemeinsam entwickelten Services werden Wissenschaftlerinnen und Wissenschaftler mit einer Stimme und auf mehreren Ebenen zum aktiven und bewussten Umgang mit Forschungsdaten angeregt. In diesem Artikel werden die ersten Erfahrungen in der Umsetzung der einzelnen Elemente des institutionellen Konzepts sowie in der Zusammenarbeit beleuchtet. Außerdem wird ein Ausblick auf die zukünftig angestrebte Entwicklung gegeben.

Thumbnail Image

“When was this picture taken?” – Image date estimation in the wild

2017, Müller, E., Springstein, M., Ewerth, R.

The problem of automatically estimating the creation date of photos has been addressed rarely in the past. In this paper, we introduce a novel dataset Date Estimation in the Wild for the task of predicting the acquisition year of images captured in the period from 1930 to 1999. In contrast to previous work, the dataset is neither restricted to color photography nor to specific visual concepts. The dataset consists of more than one million images crawled from Flickr and contains a large number of different motives. In addition, we propose two baseline approaches for regression and classification, respectively, relying on state-of-the-art deep convolutional neural networks. Experimental results demonstrate that these baselines are already superior to annotations of untrained humans.

Thumbnail Image

Beitragsmodell (arXiv)

2017, Tobschall, Esther

Auch nach 25 Jahren ist der E-Print-Server arXiv noch immer eine bedeutende Plattform für die schnelle Veröffentlichung von Forschungsergebnissen und we-sentliche Informationsquelle für seine Fachgebiete. arXiv ist zentrales Fachrepo-sitorium und gilt als Prototyp des Open-Access-Publizierens. Dennoch hat Erfolg auch immer seinen Preis: Dieser Beitrag stellt die Informationsplattform arXiv vor und beschreibt die Erfahrungen mit einem Geschäftsmodell, das über Mit-gliedsbeiträge eine nachhaltige Finanzierung erreichen will.

Thumbnail Image

Leben und Werk des Karl Hahn

2017, Mensing, Petra

Von 1905 bis zu seinem Lebensende 1946 stellte der Musiklehrer und Botaniker Karl Hahn eine umfangreiche Sammlung der Mecklenburger Flora insbesondere der Moose in der Umgebung von Neukloster und Grabow zusammen. Neben den noch vorhandenen Belegen hat er diverse Veröffentlichungen im Archiv der Freunde der Naturgeschichte in Mecklenburg hinterlassen, die neben der Beschreibung der einzelnen Funde auch Wanderbeschreibungen und Naturbeobachtungen thematisierten. In diesem Beitrag werden alle von ihm als „Neu für Mecklenburg“ bezeichneten Moosarten erstmals in einer Veröffentlichung zusammengetragen sowie Anregungen für zukünftige Arbeiten gegeben.

Thumbnail Image

“Are machines better than humans in image tagging?” - A user study adds to the puzzle

2017, Ewerth, Ralph, Springstein, Matthias, Phan-Vogtmann, Lo An, Schütze, Juliane

“Do machines perform better than humans in visual recognition tasks?” Not so long ago, this question would have been considered even somewhat provoking and the answer would have been clear: “No”. In this paper, we present a comparison of human and machine performance with respect to annotation for multimedia retrieval tasks. Going beyond recent crowdsourcing studies in this respect, we also report results of two extensive user studies. In total, 23 participants were asked to annotate more than 1000 images of a benchmark dataset, which is the most comprehensive study in the field so far. Krippendorff’s α is used to measure inter-coder agreement among several coders and the results are compared with the best machine results. The study is preceded by a summary of studies which compared human and machine performance in different visual and auditory recognition tasks. We discuss the results and derive a methodology in order to compare machine performance in multimedia annotation tasks at human level. This allows us to formally answer the question whether a recognition problem can be considered as solved. Finally, we are going to answer the initial question.

Thumbnail Image

Survey: Open Science in Higher Education

2017, Heck, Tamara, Blümel, Ina, Heller, Lambert, Mazarakis, Athanasios, Peters, Isabella, Scherp, Ansgar, Weisel, Luzian

Based on a checklist that was developed during a workshop at OER Camp 2016 and presented as a Science 2.0 conference 2016 poster [1], we conducted an online survey among university teachers representing a sufficient variety of subjects. The survey was online from Feb 6th to March 3rd 2017. We got 360 responses, whereof 210 were completes, see raw data [2]. The poster is presented at Open Science Conference, 21.-22.3.2017, Berlin.

Thumbnail Image

Towards OSGeo best practices for scientific software citation: Integration options for persistent identifiers in OSGeo project repositories

2017, Löwe, Peter Heinz, Neteler, Markus, Goebel, Jan, Tullney, Marco

As a contribution to the currently ongoing larger effort to establish Open Science as best practices in academia, this article focuses on the Open Source and Open Access tiers of the Open Science triad and community software projects. The current situation of research software development and the need to recognize it as a significant contribution to science is introduced in relation to Open Science. The adoption of the Open Science paradigms occurs at different speeds and on different levels within the various fields of science and crosscutting software communities. This is paralleled by the emerging of an underlying futuresafe technical infrastructure based on open standards to enable proper recognition for published articles, data, and software. Currently the number of journal publications about research software remains low in comparison to the amount of research code published on various software repositories in the WWW. Because common standards for the citation of software projects (containers) and versions of software are lacking, the FORCE11 group and the CodeMeta project recommending to establish Persistent Identifiers (PIDs), together with suitable metadata setss to reliably cite research software. This approach is compared to the best practices implemented by the OSGeo Foundation for geospatial community software projects. For GRASS GIS, a OSGeo project and one of the oldest geospatial open source community projects, the external requirements for DOI-based software citation are compared with the projects software documentation standards. Based on this status assessment, application scenarios are derived, how OSGeo projects can approach DOI-based software citation, both as a standalone option and also as a means to foster open access journal publications as part of reproducible Open Science.

Thumbnail Image

14 Years of PID services at the German National Library of Science and Technology (TIB): Connected frameworks, research data and lessons learned from a National Research Library perspective

2017, Kraft, Angelina, Dreyer, Britta, Löwe, Peter, Ziedorn, Frauke

In an ideal research world, any scientific content should be citable and the coherent content, as well as the citation itself, should be persistent. However, today’s scientists do not only produce traditional research papers – they produce comprehensive digital resources and collections. TIB’s mission is to develop a supportive framework for a sustainable access to such digital content – focusing on areas of engineering as well as architecture, chemistry, information technology, mathematics and physics. The term digital content comprises all digitally available resources such as audiovisual media, databases, texts, images, spreadsheets, digital lab journals, multimedia, 3D objects, statistics and software code. In executing this mission, TIB provides services for the management of digital content during ongoing and for finished research. This includes: • a technical and administrative infrastructure for indexing, cataloguing, DOI registration and licensing for text and digital objects, namely the TIB DOI registration which is active since 2005, • the administration of the ORCID DE consortium, an institutional network fostering the adoption of ORCID across academic institutions in Germany, • training and consultancy for data management, complemented with a digital repository for the deposition and provision of accessible, traceable and citable research data (RADAR), • a Research and Development Department where innovative projects focus on the visualization and the sustainable access to digital information, and • the development of a supportive framework within the German research data community which accompanies the life cycle of scientific knowledge generation and transfer. Its goal is to harmonize (meta)data display and exchange primarily on a national level (LEIBNIZ DATA project).