Search Results

Now showing 1 - 10 of 23
  • Item
    Temporal Role Annotation for Named Entities
    (Amsterdam [u.a.] : Elsevier, 2018) Koutraki, Maria; Bakhshandegan-Moghaddam, Farshad; Sack, Harald; Fensel, Anna; de Boer, Victor; Pellegrini, Tassilo; Kiesling, Elmar; Haslhofer, Bernhard; Hollink, Laura; Schindler, Alexander
    Natural language understanding tasks are key to extracting structured and semantic information from text. One of the most challenging problems in natural language is ambiguity and resolving such ambiguity based on context including temporal information. This paper, focuses on the task of extracting temporal roles from text, e.g. CEO of an organization or head of a state. A temporal role has a domain, which may resolve to different entities depending on the context and especially on temporal information, e.g. CEO of Microsoft in 2000. We focus on the temporal role extraction, as a precursor for temporal role disambiguation. We propose a structured prediction approach based on Conditional Random Fields (CRF) to annotate temporal roles in text and rely on a rich feature set, which extracts syntactic and semantic information from text. We perform an extensive evaluation of our approach based on two datasets. In the first dataset, we extract nearly 400k instances from Wikipedia through distant supervision, whereas in the second dataset, a manually curated ground-truth consisting of 200 instances is extracted from a sample of The New York Times (NYT) articles. Last, the proposed approach is compared against baselines where significant improvements are shown for both datasets.
  • Item
    Contextual Language Models for Knowledge Graph Completion
    (Aachen, Germany : RWTH Aachen, 2021) Russa, Biswas; Sofronova, Radina; Alam, Mehwish; Sack, Harald; Mehwish, Alam; Ali, Medi; Groth, Paul; Hitzler, Pascal; Lehmann, Jens; Paulheim, Heiko; Rettinger, Achim; Sack, Harald; Sadeghi, Afshin; Tresp, Volker
    Knowledge Graphs (KGs) have become the backbone of various machine learning based applications over the past decade. However, the KGs are often incomplete and inconsistent. Several representation learning based approaches have been introduced to complete the missing information in KGs. Besides, Neural Language Models (NLMs) have gained huge momentum in NLP applications. However, exploiting the contextual NLMs to tackle the Knowledge Graph Completion (KGC) task is still an open research problem. In this paper, a GPT-2 based KGC model is proposed and is evaluated on two benchmark datasets. The initial results obtained from the _ne-tuning of the GPT-2 model for triple classi_cation strengthens the importance of usage of NLMs for KGC. Also, the impact of contextual language models for KGC has been discussed.
  • Item
    Biobank Oversight and Sanctions Under the General Data Protection Regulation
    (Dordrecht ; Heidelberg ; New York ; London : Springer, 2021) Hallinan, Dara; Slokenberga, Santa; Tzortzatou, Olga; Reichel, Jane
    This contribution offers an insight into the function and problems of the oversight and sanctions mechanisms outlined in the General Data Protection Regulation as they relate to the biobanking context. These mechanisms might be considered as meta-mechanisms—mechanisms relating to, but not consisting of, substantive legal principles—functioning in tandem to ensure biobank compliance with data protection principles. Each of the mechanisms outlines, on paper at least, comprehensive and impressive compliance architecture—both expanding on their capacity in relation to Directive 95/46. Accordingly, each mechanism looks likely to have a significant and lasting impact on biobanks and biobanking. Despite this comprehensiveness, however, the mechanisms are not immune from critique. Problems appear regarding the standard of protection provided for research subject rights, regarding the disproportionate impact on legitimate interests tied up with the biobanking process—particularly genomic research interests—and regarding their practical implementability in biobanking.
  • Item
    Modelling Archival Hierarchies in Practice: Key Aspects and Lessons Learned
    (Aachen, Germany : RWTH Aachen, 2021) Vafaie, Mahsa; Bruns, Oleksandra; Pilz, Nastasja; Dessì, Danilo; Sack, Harald; Sumikawa, Yasunobu; Ikejiri, Ryohei; Doucet, Antoine; Pfanzelter, Eva; Hasanuzzaman, Mohammed; Dias, Gaël; Milligan, Ian; Jatowt, Adam
    An increasing number of archival institutions aim to provide public access to historical documents. Ontologies have been designed, developed and utilised to model the archival description of historical documents and to enable interoperability between different information sources. However, due to the heterogeneous nature of archives and archival systems, current ontologies for the representation of archival content do not always cover all existing structural organisation forms equallywell. After briefly contextualising the heterogeneity in the hierarchical structure of German archives, this paper describes and evaluates differences between two archival ontologies, ArDO and RiC-O, and their approaches to modelling hierarchy levels and archive dynamics.
  • Item
    Leveraging Literals for Knowledge Graph Embeddings
    (Aachen, Germany : RWTH Aachen, 2021) Gesese, Genet Asefa; Tamma, Valentina; Fernandez, Miriam; Poveda-Villalón, María
    Nowadays, Knowledge Graphs (KGs) have become invaluable for various applications such as named entity recognition, entity linking, question answering. However, there is a huge computational and storage cost associated with these KG-based applications. Therefore, there arises the necessity of transforming the high dimensional KGs into low dimensional vector spaces, i.e., learning representations for the KGs. Since a KG represents facts in the form of interrelations between entities and also using attributes of entities, the semantics present in both forms should be preserved while transforming the KG into a vector space. Hence, the main focus of this thesis is to deal with the multimodality and multilinguality of literals when utilizing them for the representation learning of KGs. The other task is to extract benchmark datasets with a high level of difficulty for tasks such as link prediction and triple classification. These datasets could be used for evaluating both kind of KG Embeddings, those using literals and those which do not include literals.
  • Item
    DDB-KG: The German Bibliographic Heritage in a Knowledge Graph
    (Aachen, Germany : RWTH Aachen, 2021) Tan, Mary Ann; Tietz, Tabea; Bruns, Oleksandra; Oppenlaender, Jonas; Dessì, Danilo; Harald, Sack; Sumikawa, Yasunobu; Ikejiri, Ryohei; Doucet, Antoine; Pfanzelter, Eva; Hasanuzzaman, Mohammed; Dias, Gaël; Milligan, Ian; Jatowt, Adam
    Under the German government’s initiative “NEUSTART Kultur”, the German Digital Library or Deutsche Digitale Bibliothek (DDB) is undergoing improvements to enhance user-experience. As an initial step, emphasis is placed on creating a knowledge graph from the bibliographic record collection of the DDB. This paper discusses the challenges facing the DDB in terms of retrieval and the solutions in addressing them. In particular, limitations of the current data model or ontology to represent bibliographic metadata is analyzed through concrete examples. This study presents the complete ontological mapping from DDB-Europeana Data Model (DDB-EDM) to FaBiO, and a prototype of the DDB-KG made available as a SPARQL endpoint. The suitabiliy of the target ontology is demonstrated with SPARQL queries formulated from competency questions.
  • Item
    The Concept of Identifiability in ML Models
    (Setúbal : SciTePress - Science and Technology Publications, Lda., 2022) von Maltzan, Stephanie; Bastieri, Denis; Wills, Gary; Kacsuk, Péter; Chang, Victor
    Recent research indicates that the machine learning process can be reversed by adversarial attacks. These attacks can be used to derive personal information from the training. The supposedly anonymising machine learning process represents a process of pseudonymisation and is, therefore, subject to technical and organisational measures. Consequently, the unexamined belief in anonymisation as a guarantor for privacy cannot be easily upheld. It is, therefore, crucial to measure privacy through the lens of adversarial attacks and precisely distinguish what is meant by personal data and non-personal data and above all determine whether ML models represent pseudonyms from the training data.
  • Item
    Steps towards a Dislocation Ontology for Crystalline Materials
    (Aachen, Germany : RWTH Aachen, 2021) Ihsan, Ahmad Zainul; Dessì, Danilo; Alam, Mehwish; Sack, Harald; Sandfeld, Stefan; García-Castro, Raúl; Davies, John; Antoniou, Grigoris; Fortuna, Carolina
    The field of Materials Science is concerned with, e.g., properties and performance of materials. An important class of materials are crystalline materials that usually contain “dislocations" - a line-like defect type. Dislocation decisively determine many important materials properties. Over the past decades, significant effort was put into understanding dislocation behavior across different length scales both with experimental characterization techniques as well as with simulations. However, for describing such dislocation structures there is still a lack of a common standard to represent and to connect dislocation domain knowledge across different but related communities. An ontology offers a common foundation to enable knowledge representation and data interoperability, which are important components to establish a “digital twin". This paper outlines the first steps towards the design of an ontology in the dislocation domain and shows a connection with the already existing ontologies in the materials science and engineering domain.
  • Item
    Designing Intelligent Systems for Online Education: Open Challenges and Future Directions
    (Aachen, Germany : RWTH Aachen, 2021) Dessì, Danilo; Käser, Tanja; Marras, Mirko; Popescu, Elvira; Sack, Harald; Dessì, Danilo; Käser, Tanja; Marras, Mirko; Popescu, Elvira; Sack, Harald
    The design and delivering of platforms for online education is fostering increasingly intense research. Scaling up education online brings new emerging needs related with hardly manageable classes, overwhelming content alternatives, and academic dishonesty while interacting remotely, as examples. However, with the impressive progress of the data mining and machine learning fields, combined with the large amounts of learning-related data and high-performance computing, it has been possible to gain a deeper understanding of the nature of learning and teaching online. Methods at the analytical and algorithmic levels are constantly being developed and hybrid approaches are receiving an increasing attention. Recent methods are analyzing not only the online traces left by students a posteriori, but also the extent to which this data can be turned into actionable insights and models, to support the above needs in a computationally efficient, adaptive and timely way. In this paper, we present relevant open challenges lying at the intersection between the machine learning and educational communities, that need to be addressed to further develop the field of intelligent systems for online education. Several areas of research in this field are identified, such as data availability and sharing, time-wise and multi-modal data modelling, generalizability, fairness, explainability, interpretability, privacy, and ethics behind models delivered for supporting education. Practical challenges and recommendations for possible research directions are provided for each of them, paving the way for future advances in this field.
  • Item
    Detecting Cross-Language Plagiarism using Open Knowledge Graphs
    (Aachen, Germany : RWTH Aachen, 2021) Stegmüller, Johannes; Bauer-Marquart, Fabian; Meuschke, Norman; Ruas, Terry; Schubotz, Moritz; Gipp, Bela; Zhang, Chengzhi; Mayr, Philipp; Lu, Wie; Zhang, Yi
    Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations. We introduce the new multilingual retrieval model Cross-Language Ontology-Based Similarity Analysis (CL-OSA) for this task. CL-OSA represents documents as entity vectors obtained from the open knowledge graph Wikidata. Opposed to other methods, CL-OSA does not require computationally expensive machine translation, nor pre-training using comparable or parallel corpora. It reliably disambiguates homonyms and scales to allow its application toWebscale document collections. We show that CL-OSA outperforms state-of-the-art methods for retrieving candidate documents from five large, topically diverse test corpora that include distant language pairs like Japanese-English. For identifying cross-language plagiarism at the character level, CL-OSA primarily improves the detection of sense-for-sense translations. For these challenging cases, CL-OSA’s performance in terms of the well-established PlagDet score exceeds that of the best competitor by more than factor two. The code and data of our study are openly available.