Search Results

Now showing 1 - 3 of 3
  • Item
    Baroque AI
    (Zenodo, 2023) Worthington, Simon; Blümel, Ina
    Publication prototype: A computational publishing and AI assisted writing course unit with students of the Open Knowledge class – at Hochschule Hannover with the Open Science Lab, TIB. The prototype publication exercise involves creating a fictional ‘exhibition catalogue’ drawing on Wikidata based cataloguing of seventeenth century painting deposited by the Bavarian State Painting Collections. The prototype demostrates how computational publishing can be used to bring together different distributed linked open data (LOD) sources. Additionally AI tools are used for assisted essay writing. Then both are encapsulated in a multi-format computational publication — allowing for asynchronous collaborative working. Distributed LOD sources include: Wikidata/base, Nextcloud, Thoth, Semantic Kompakkt, and TIB AV Portal. AI tools used for essay writing are — OpenAI and Perplexity. Eleven students completed the class unit which was carried out over March to April 2023. An open access OER guide to running the class, a template publication for use in the class are online on GitHub and designed for OER reuse. Full class information and resources are on Wikiversity. The open source software used is brought together in the ADA Pipeline.
  • Item
    Detecting Cross-Language Plagiarism using Open Knowledge Graphs
    (Aachen, Germany : RWTH Aachen, 2021) Stegmüller, Johannes; Bauer-Marquart, Fabian; Meuschke, Norman; Ruas, Terry; Schubotz, Moritz; Gipp, Bela; Zhang, Chengzhi; Mayr, Philipp; Lu, Wie; Zhang, Yi
    Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations. We introduce the new multilingual retrieval model Cross-Language Ontology-Based Similarity Analysis (CL-OSA) for this task. CL-OSA represents documents as entity vectors obtained from the open knowledge graph Wikidata. Opposed to other methods, CL-OSA does not require computationally expensive machine translation, nor pre-training using comparable or parallel corpora. It reliably disambiguates homonyms and scales to allow its application toWebscale document collections. We show that CL-OSA outperforms state-of-the-art methods for retrieving candidate documents from five large, topically diverse test corpora that include distant language pairs like Japanese-English. For identifying cross-language plagiarism at the character level, CL-OSA primarily improves the detection of sense-for-sense translations. For these challenging cases, CL-OSA’s performance in terms of the well-established PlagDet score exceeds that of the best competitor by more than factor two. The code and data of our study are openly available.
  • Item
    Mathematics in Wikidata
    (Aachen, Germany : RWTH Aachen, 2021) Scharpf, Philipp; Schubotz, Moritz; Gipp, Bela; Kaffee, Lucie-Aimée; Razniewski, Simon; Hogan, Aidan
    Documents from Science, Technology, Engineering, and Mathematics (STEM) disciplines usually contain a signicant amount of mathematical formulae alongside text. Some Mathematical Information Retrieval (MathIR) systems, e.g., Mathematical Question Answering (MathQA), exploit knowledge from Wikidata. Therefore, the mathematical information needs to be stored in items. In the last years, there have been efforts to define several properties and seed formulae together with their constituting identifiers into Wikidata. This paper summarizes the current state, challenges, and discussions related to this endeavor. Furthermore, some data mining methods (supervised formula annotation and concept retrieval) and applications (question answering and classification explainability) of the mathematical information are outlined. Finally, we discuss community feedback and issues related to integrating Mathematical Entity Linking (MathEL) into Wikidata and Wikipedia, which was rejected in 33% and 12% of the test cases, for Wikidata and Wikipedia respectively. Our long-term goal is to populate Wikidata, such that it can serve a variety of automated math reasoning tasks and AI systems.