Search Results

Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Item

Improving Zero-Shot Text Classification with Graph-based Knowledge Representations

2022, Hoppe, Fabian, Hartig, Olaf, Seneviratne, Oshani

Insufficient training data is a key challenge for text classification. In particular, long-tail class distributions and emerging, new classes do not provide any training data for specific classes. Therefore, such a zeroshot setting must incorporate additional, external knowledge to enable transfer learning by connecting the external knowledge of previously unseen classes to texts. Recent zero-shot text classifier utilize only distributional semantics defined by large language models and based on class names or natural language descriptions. This implicit knowledge contains ambiguities, is not able to capture logical relations nor is it an efficient representation of factual knowledge. These drawbacks can be avoided by introducing explicit, external knowledge. Especially, knowledge graphs provide such explicit, unambiguous, and complementary, domain specific knowledge. Hence, this thesis explores graph-based knowledge as additional modality for zero-shot text classification. Besides a general investigation of this modality, the influence on the capabilities of dealing with domain shifts by including domain-specific knowledge is explored.

Loading...
Thumbnail Image
Item

Improving Language Model Predictions via Prompts Enriched with Knowledge Graphs

2023, Brate, Ryan, Minh-Dang, Hoang, Hoppe, Fabian, He, Yuan, Meroño-Peñuela, Albert, Sadashivaiah, Vijay, Alam, Mehwish, Buscaldi, Davide, Cochez, Michael, Osborne, Francesco, Reforgiato Recupero, Diego

Despite advances in deep learning and knowledge graphs (KGs), using language models for natural language understanding and question answering remains a challenging task. Pre-trained language models (PLMs) have shown to be able to leverage contextual information, to complete cloze prompts, next sentence completion and question answering tasks in various domains. Unlike structured data querying in e.g. KGs, mapping an input question to data that may or may not be stored by the language model is not a simple task. Recent studies have highlighted the improvements that can be made to the quality of information retrieved from PLMs by adding auxiliary data to otherwise naive prompts. In this paper, we explore the effects of enriching prompts with additional contextual information leveraged from the Wikidata KG on language model performance. Specifically, we compare the performance of naive vs. KG-engineered cloze prompts for entity genre classification in the movie domain. Selecting a broad range of commonly available Wikidata properties, we show that enrichment of cloze-style prompts with Wikidata information can result in a significantly higher recall for the investigated BERT and RoBERTa large PLMs. However, it is also apparent that the optimum level of data enrichment differs between models.

Loading...
Thumbnail Image
Item

Diving into Knowledge Graphs for Patents: Open Challenges and Benefits

2023, Dessi, Danilo, Dessi, Rima, Alam, Mehwish, Trojahn, Cassia, Hertling, Sven, Pesquita, Catia, Aebeloe, Christian, Aras, Hidir, Azzam, Amr, Cano, Juan, Domingue, John, Gottschalk, Simon, Hartig, Olaf, Hose, Katja, Kirrane, Sabrina, Lisena, Pasquale, Osborne, Francesco, Rohde, Philipp, Steels, Luc, Taelman, Ruben, Third, Aisling, Tiddi, Ilaria, Türker, Rima

Textual documents are the means of sharing information and preserving knowledge for a large variety of domains. The patent domain is also using such a paradigm which is becoming difficult to maintain and is limiting the potentialities of using advanced AI systems for domain analysis. To overcome this issue, it is more and more frequent to find approaches to transform textual representations into Knowledge Graphs (KGs). In this position paper, we discuss KGs within the patent domain, present its challenges, and envision the benefits of such technologies for this domain. In addition, this paper provides insights of such KGs by reproducing an existing pipeline to create KGs and applying it to patents in the computer science domain.

Loading...
Thumbnail Image
Item

Workshop on PIDs within NFDI: Report of the Working Group “Persistent Identifiers (PID)” of the Section Common Infrastructures of the NFDI

2023, Arend, Daniel, Bach, Janete, Elger, Kirsten, Göller, Sandra, Hagemann-Wilholt, Stephanie, Krahl, Rolf, Lange, Matthias, Linke, David, Mayer, Desiree, Mutschke, Peter, Reimer, Lorenz, Scheidgen, Markus, Schrader, Antonia C., Selzer, Michael, Wieder, Philipp

In order to gain an overview of the current state of the discussion on PIDs and for the identification of use cases for the initiation phase of a PID service within the NFDI basic services, the working group Persistent Identifier of the Section Common Infrastructures of the NFDI hosted an online workshop in January 2023. In the course of the workshop, members of nine different NFDI consortia presented the current application of PIDs in their consortia.

Loading...
Thumbnail Image
Item

SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs

2020, Iglesias, Enrique, Jozashoori, Samaneh, Chaves-Fraga, David, Collarana, Diego, Vidal, Maria-Esther

In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.