Search Results

Now showing 1 - 10 of 38
Loading...
Thumbnail Image
Item

Handreichung Technik und Infastrukturen

2023, Eichler, Frederik, Eppelin, Anita, Kampkaspar, Dario, Schrader, Antonia C., Söllner, Konstanze, Vierkant, Paul, Withanage, Dulip, Wrzesinski, Marcel

In der vorliegenden Handreichung stellen wir unterschiedliche technische Ressourcen vor, die redaktionelle Arbeiten unterstützen können. Dabei empfiehlt es sich, Software und Systeme zu nutzen, die den Wandel hin zu einer offenen, niederschwelligen und nachhaltigen Wissenschaftskultur fördern. Hierzu zählt in erster Linie die Verwendung von Open-Source-Software. Unsere Empfehlungen haben dabei eine begrenzte Reichweite: Serviceanbieter, Software und Projekte sind zu einem späteren Zeitpunkt ggf. nicht mehr verfügbar. Auch sind gerade die Infrastruktureinrichtungen in das föderale Wissenschaftssystem integriert, was sie bestimmten Unwägbarkeiten aussetzt.

Loading...
Thumbnail Image
Item

Information extraction pipelines for knowledge graphs

2023, Jaradeh, Mohamad Yaser, Singh, Kuldeep, Stocker, Markus, Both, Andreas, Auer, Sören

In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community’s disjoint efforts on KG completion. We include more components into the architecture of Plumber to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations.

Loading...
Thumbnail Image
Item

Understanding image-text relations and news values for multimodal news analysis

2023, Cheema, Gullal S., Hakimov, Sherzod, Müller-Budack, Eric, Otto, Christian, Bateman, John A., Ewerth, Ralph

The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.

Loading...
Thumbnail Image
Item

Workshop on PIDs within NFDI: Report of the Working Group “Persistent Identifiers (PID)” of the Section Common Infrastructures of the NFDI

2023, Arend, Daniel, Bach, Janete, Elger, Kirsten, Göller, Sandra, Hagemann-Wilholt, Stephanie, Krahl, Rolf, Lange, Matthias, Linke, David, Mayer, Desiree, Mutschke, Peter, Reimer, Lorenz, Scheidgen, Markus, Schrader, Antonia C., Selzer, Michael, Wieder, Philipp

In order to gain an overview of the current state of the discussion on PIDs and for the identification of use cases for the initiation phase of a PID service within the NFDI basic services, the working group Persistent Identifier of the Section Common Infrastructures of the NFDI hosted an online workshop in January 2023. In the course of the workshop, members of nine different NFDI consortia presented the current application of PIDs in their consortia.

Loading...
Thumbnail Image
Item

An Approach to Evaluate User Interfaces in a Scholarly Knowledge Communication Domain

2023, Obrezkov, Denis, Oelen, Allard, Auer, Sören, Abdelnour-Nocera, José L., Marta Lárusdóttir, Petrie, Helen, Piccinno, Antonio, Winckler, Marco

The amount of research articles produced every day is overwhelming: scholarly knowledge is getting harder to communicate and easier to get lost. A possible solution is to represent the information in knowledge graphs: structures representing knowledge in networks of entities, their semantic types, and relationships between them. But this solution has its own drawback: given its very specific task, it requires new methods for designing and evaluating user interfaces. In this paper, we propose an approach for user interface evaluation in the knowledge communication domain. We base our methodology on the well-established Cognitive Walkthough approach but employ a different set of questions, tailoring the method towards domain-specific needs. We demonstrate our approach on a scholarly knowledge graph implementation called Open Research Knowledge Graph (ORKG).

Loading...
Thumbnail Image
Item

Ranking facts for explaining answers to elementary science questions

2023, D’Souza, Jennifer, Mulang, Isaiah Onando, Auer, Sören

In multiple-choice exams, students select one answer from among typically four choices and can explain why they made that particular choice. Students are good at understanding natural language questions and based on their domain knowledge can easily infer the question's answer by “connecting the dots” across various pertinent facts. Considering automated reasoning for elementary science question answering, we address the novel task of generating explanations for answers from human-authored facts. For this, we examine the practically scalable framework of feature-rich support vector machines leveraging domain-targeted, hand-crafted features. Explanations are created from a human-annotated set of nearly 5000 candidate facts in the WorldTree corpus. Our aim is to obtain better matches for valid facts of an explanation for the correct answer of a question over the available fact candidates. To this end, our features offer a comprehensive linguistic and semantic unification paradigm. The machine learning problem is the preference ordering of facts, for which we test pointwise regression versus pairwise learning-to-rank. Our contributions, originating from comprehensive evaluations against nine existing systems, are (1) a case study in which two preference ordering approaches are systematically compared, and where the pointwise approach is shown to outperform the pairwise approach, thus adding to the existing survey of observations on this topic; (2) since our system outperforms a highly-effective TF-IDF-based IR technique by 3.5 and 4.9 points on the development and test sets, respectively, it demonstrates some of the further task improvement possibilities (e.g., in terms of an efficient learning algorithm, semantic features) on this task; (3) it is a practically competent approach that can outperform some variants of BERT-based reranking models; and (4) the human-engineered features make it an interpretable machine learning model for the task.

Loading...
Thumbnail Image
Item

Seriös oder nicht? Die individuelle Prüfung der Qualität von Zeitschriften an der TIB

2023, Schmeja, Stefan, Kändler, Ulrike

Die Frage nach der Qualität von Open-Access-Zeitschriften stellt sich an der TIB sowohl bei der Beratung von Autor:innen als auch bei der Förderung durch einen Publikationsfonds. Wenn eine Zeitschrift nicht im Directory of Open Access Journals (DOAJ) gelistet ist und auch nicht eindeutig unseriös erscheint, wird sie individuell anhand unterschiedlicher Kriterien geprüft. In diesem Beitrag stellen wir die benutzten Kriterien vor und schildern unsere Erfahrung bei der Einschätzung.

Loading...
Thumbnail Image
Item

NFDI4Chem - A Research Data Network for International Chemistry

2023, Steinbeck, Christoph, Koepler, Oliver, Herres-Pawlis, Sonja, Bach, Felix, Jung, Nicole, Razum, Matthias, Liermann, Johannes C., Neumann, Steffen

Research data provide evidence for the validation of scientific hypotheses in most areas of science. Open access to them is the basis for true peer review of scientific results and publications. Hence, research data are at the heart of the scientific method as a whole. The value of openly sharing research data has by now been recognized by scientists, funders and politicians. Today, new research results are increasingly obtained by drawing on existing data. Many organisations such as the Research Data Alliance (RDA), the goFAIR initiative, and not least IUPAC are supporting and promoting the collection and curation of research data. One of the remaining challenges is to find matching data sets, to understand them and to reuse them for your own purpose. As a consequence, we urgently need better research data management.

Loading...
Thumbnail Image
Item

Depression, anxiety, and burnout in academia: topic modeling of PubMed abstracts

2023, Lezhnina, Olga

The problem of mental health in academia is increasingly discussed in literature, and to extract meaningful insights from the growing amount of scientific publications, text mining approaches are used. In this study, BERTopic, an advanced method of topic modeling, was applied to abstracts of 2,846 PubMed articles on depression, anxiety, and burnout in academia published in years 1975–2023. BERTopic is a modular technique comprising a text embedding method, a dimensionality reduction procedure, a clustering algorithm, and a weighing scheme for topic representation. A model was selected based on the proportion of outliers, the topic interpretability considerations, topic coherence and topic diversity metrics, and the inevitable subjectivity of the criteria was discussed. The selected model with 27 topics was explored and visualized. The topics evolved differently with time: research papers on students' pandemic-related anxiety and medical residents' burnout peaked in recent years, while publications on psychometric research or internet-related problems are yet to be presented more amply. The study demonstrates the use of BERTopic for analyzing literature on mental health in academia and sheds light on areas in the field to be addressed by further research.

Loading...
Thumbnail Image
Item

Global visibility of publications through Digital Object Identifiers

2023, Turki, Houcemeddine, Fraumann, Grischa, Hadj Taieb, Mohamed Ali, Ben Aouicha, Mohamed

This brief research report analyzes the availability of Digital Object Identifiers (DOIs) worldwide, highlighting the dominance of large publishing houses and the need for unique persistent identifiers to increase the visibility of publications from developing countries. The study reveals that a considerable amount of publications from developing countries are excluded from the global flow of scientific information due to the absence of DOIs, emphasizing the need for alternative publishing models. The authors suggest that the availability of DOIs should receive more attention in scholarly communication and scientometrics, contributing to a necessary debate on DOIs relevant for librarians, publishers, and scientometricians.