Search Results

Now showing 1 - 4 of 4
  • Item
    Improving Zero-Shot Text Classification with Graph-based Knowledge Representations
    (Aachen, Germany : RWTH Aachen, 2022) Hoppe, Fabian; Hartig, Olaf; Seneviratne, Oshani
    Insufficient training data is a key challenge for text classification. In particular, long-tail class distributions and emerging, new classes do not provide any training data for specific classes. Therefore, such a zeroshot setting must incorporate additional, external knowledge to enable transfer learning by connecting the external knowledge of previously unseen classes to texts. Recent zero-shot text classifier utilize only distributional semantics defined by large language models and based on class names or natural language descriptions. This implicit knowledge contains ambiguities, is not able to capture logical relations nor is it an efficient representation of factual knowledge. These drawbacks can be avoided by introducing explicit, external knowledge. Especially, knowledge graphs provide such explicit, unambiguous, and complementary, domain specific knowledge. Hence, this thesis explores graph-based knowledge as additional modality for zero-shot text classification. Besides a general investigation of this modality, the influence on the capabilities of dealing with domain shifts by including domain-specific knowledge is explored.
  • Item
    SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs
    (New York City, NY : Association for Computing Machinery, 2020) Iglesias, Enrique; Jozashoori, Samaneh; Chaves-Fraga, David; Collarana, Diego; Vidal, Maria-Esther
    In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.
  • Item
    Improving Language Model Predictions via Prompts Enriched with Knowledge Graphs
    (Aachen, Germany : RWTH Aachen, 2023) Brate, Ryan; Minh-Dang, Hoang; Hoppe, Fabian; He, Yuan; Meroño-Peñuela, Albert; Sadashivaiah, Vijay; Alam, Mehwish; Buscaldi, Davide; Cochez, Michael; Osborne, Francesco; Reforgiato Recupero, Diego
    Despite advances in deep learning and knowledge graphs (KGs), using language models for natural language understanding and question answering remains a challenging task. Pre-trained language models (PLMs) have shown to be able to leverage contextual information, to complete cloze prompts, next sentence completion and question answering tasks in various domains. Unlike structured data querying in e.g. KGs, mapping an input question to data that may or may not be stored by the language model is not a simple task. Recent studies have highlighted the improvements that can be made to the quality of information retrieved from PLMs by adding auxiliary data to otherwise naive prompts. In this paper, we explore the effects of enriching prompts with additional contextual information leveraged from the Wikidata KG on language model performance. Specifically, we compare the performance of naive vs. KG-engineered cloze prompts for entity genre classification in the movie domain. Selecting a broad range of commonly available Wikidata properties, we show that enrichment of cloze-style prompts with Wikidata information can result in a significantly higher recall for the investigated BERT and RoBERTa large PLMs. However, it is also apparent that the optimum level of data enrichment differs between models.
  • Item
    Diving into Knowledge Graphs for Patents: Open Challenges and Benefits
    (Aachen, Germany : RWTH Aachen, 2023) Dessi, Danilo; Dessi, Rima; Alam, Mehwish; Trojahn, Cassia; Hertling, Sven; Pesquita, Catia; Aebeloe, Christian; Aras, Hidir; Azzam, Amr; Cano, Juan; Domingue, John; Gottschalk, Simon; Hartig, Olaf; Hose, Katja; Kirrane, Sabrina; Lisena, Pasquale; Osborne, Francesco; Rohde, Philipp; Steels, Luc; Taelman, Ruben; Third, Aisling; Tiddi, Ilaria; Türker, Rima
    Textual documents are the means of sharing information and preserving knowledge for a large variety of domains. The patent domain is also using such a paradigm which is becoming difficult to maintain and is limiting the potentialities of using advanced AI systems for domain analysis. To overcome this issue, it is more and more frequent to find approaches to transform textual representations into Knowledge Graphs (KGs). In this position paper, we discuss KGs within the patent domain, present its challenges, and envision the benefits of such technologies for this domain. In addition, this paper provides insights of such KGs by reproducing an existing pipeline to create KGs and applying it to patents in the computer science domain.