Search Results

Now showing 1 - 10 of 11
  • Item
    Formalizing Gremlin pattern matching traversals in an integrated graph Algebra
    (Aachen, Germany : RWTH Aachen, 2019) Thakkar, Harsh; Auer, Sören; Vidal, Maria-Esther; Samavi, Reza; Consens, Mariano P.; Khatchadourian, Shahan; Nguyen, Vinh; Sheth, Amit; Giménez-García, José M.; Thakkar, Harsh
    Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ-ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex-ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys-tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match-ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.
  • Item
    The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources
    (Paris : European Language Resources Association, 2020) D'Souza, Jennifer; Hoppe, Anett; Brack, Arthur; Jaradeh, Mohamad Yaser; Auer, Sören; Ewerth, Ralph
    We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM disciplines that were found to be the most prolific ones on a major publishing platform. We describe the creation of such a multidisciplinary corpus and highlight the obtained findings in terms of the following features: 1) a generic conceptual formalism for scientific entities in a multidisciplinary scientific context; 2) the feasibility of the domain-independent human annotation of scientific entities under such a generic formalism; 3) a performance benchmark obtainable for automatic extraction of multidisciplinary scientific entities using BERT-based neural models; 4) a delineated 3-step entity resolution procedure for human annotation of the scientific entities via encyclopedic entity linking and lexicographic word sense disambiguation; and 5) human evaluations of Babelfy returned encyclopedic links and lexicographic senses for our entities. Our findings cumulatively indicate that human annotation and automatic learning of multidisciplinary scientific concepts as well as their semantic disambiguation in a wide-ranging setting as STEM is reasonable.
  • Item
    24th International Conference on Business Information Systems : Preface
    (Hannover : TIB Open Publishing, 2021) Abramowicz, Witold; Auer, Sören; Abramowicz, Witold; Auer, Sören; Lewańska, Elżbieta
  • Item
    Creation of a Knowledge Space by Semantically Linking Data Repository and Knowledge Management System - a Use Case from Production Engineering
    (Laxenburg : IFAC, 2022) Sheveleva, Tatyana; Wawer, Max Leo; Oladazimi, Pooya; Koepler, Oliver; Nürnberger, Florian; Lachmayer, Roland; Auer, Sören; Mozgova, Iryna
    The seamless documentation of research data flows from generation, processing, analysis, publication, and reuse is of utmost importance when dealing with large amounts of data. Semantic linking of process documentation and gathered data creates a knowledge space enabling the discovery of relations between steps of process chains. This paper shows the design of two systems for data deposit and for process documentation using semantic annotations and linking on a use case of a process chain step of the Tailored Forming Technology.
  • Item
    Why reinvent the wheel: Let's build question answering systems together
    (New York City : Association for Computing Machinery, 2018) Singh, K.; Radhakrishna, A.S.; Both, A.; Shekarpour, S.; Lytra, I.; Usbeck, R.; Vyas, A.; Khikmatullaev, A.; Punjani, D.; Lange, C.; Vidal, Maria-Esther; Lehmann, J.; Auer, Sören
    Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.
  • Item
    Crowdsourcing Scholarly Discourse Annotations
    (New York, NY : ACM, 2021) Oelen, Allard; Stocker, Markus; Auer, Sören
    The number of scholarly publications grows steadily every year and it becomes harder to find, assess and compare scholarly knowledge effectively. Scholarly knowledge graphs have the potential to address these challenges. However, creating such graphs remains a complex task. We propose a method to crowdsource structured scholarly knowledge from paper authors with a web-based user interface supported by artificial intelligence. The interface enables authors to select key sentences for annotation. It integrates multiple machine learning algorithms to assist authors during the annotation, including class recommendation and key sentence highlighting. We envision that the interface is integrated in paper submission processes for which we define three main task requirements: The task has to be . We evaluated the interface with a user study in which participants were assigned the task to annotate one of their own articles. With the resulting data, we determined whether the participants were successfully able to perform the task. Furthermore, we evaluated the interface’s usability and the participant’s attitude towards the interface with a survey. The results suggest that sentence annotation is a feasible task for researchers and that they do not object to annotate their articles during the submission process.
  • Item
    Metadata analysis of open educational resources
    (New York,NY,United States : Association for Computing Machinery, 2021) Tavakoli, Mohammadreza; Elias, Mirette; Kismihók, Gábor; Auer, Sören; Scheffel, Maren
    Open Educational Resources (OERs) are openly licensed educational materials that are widely used for learning. Nowadays, many online learning repositories provide millions of OERs. Therefore, it is exceedingly difficult for learners to find the most appropriate OER among these resources. Subsequently, the precise OER metadata is critical for providing high-quality services such as search and recommendation. Moreover, metadata facilitates the process of automatic OER quality control as the continuously increasing number of OERs makes manual quality control extremely difficult. This work uses the metadata of 8,887 OERs to perform an exploratory data analysis on OER metadata. Accordingly, this work proposes metadata-based scoring and prediction models to anticipate the quality of OERs. Based on the results, our analysis demonstrated that OER metadata and OER content qualities are closely related, as we could detect high-quality OERs with an accuracy of 94.6%. Our model was also evaluated on 884 educational videos from Youtube to show its applicability on other educational repositories.
  • Item
    Semantic Representation of Physics Research Data
    (Setúbal, Portugal : Science and Technology Publications, Lda, 2020) Say, Aysegul; Fathalla, Said; Vahdati, Sahar; Lehmann, Jens; Auer, Sören; Aveiro, David; Dietz, Jan; Filipe, Joaquim
    Improvements in web technologies and artificial intelligence enable novel, more data-driven research practices for scientists. However, scientific knowledge generated from data-intensive research practices is disseminated with unstructured formats, thus hindering the scholarly communication in various respects. The traditional document-based representation of scholarly information hampers the reusability of research contributions. To address this concern, we developed the Physics Ontology (PhySci) to represent physics-related scholarly data in a machine-interpretable format. PhySci facilitates knowledge exploration, comparison, and organization of such data by representing it as knowledge graphs. It establishes a unique conceptualization to increase the visibility and accessibility to the digital content of physics publications. We present the iterative design principles by outlining a methodology for its development and applying three different evaluation approaches: data-driven and criteria-based evaluation, as well as ontology testing.
  • Item
    EVENTSKG: A 5-Star Dataset of Top-Ranked Events in Eight Computer Science Communities
    (Berlin ; Heidelberg : Springer, 2019) Fathalla, Said; Lange, Christoph; Auer, Sören; Hitzler, Pascal; Fernández, Miriam; Janowicz, Krzysztof; Zaveri, Amrapali; Gray, Alasdair J.G.; Lopez, Vanessa; Haller, Armin; Hammar, Karl
    Metadata of scientific events has become increasingly available on the Web, albeit often as raw data in various formats, disregarding its semantics and interlinking relations. This leads to restricting the usability of this data for, e.g., subsequent analyses and reasoning. Therefore, there is a pressing need to represent this data in a semantic representation, i.e., Linked Data. We present the new release of the EVENTSKG dataset, comprising comprehensive semantic descriptions of scientific events of eight computer science communities. Currently, EVENTSKG is a 5-star dataset containing metadata of 73 top-ranked event series (almost 2,000 events) established over the last five decades. The new release is a Linked Open Dataset adhering to an updated version of the Scientific Events Ontology, a reference ontology for event metadata representation, leading to richer and cleaner data. To facilitate the maintenance of EVENTSKG and to ensure its sustainability, EVENTSKG is coupled with a Java API that enables users to add/update events metadata without going into the details of the representation of the dataset. We shed light on events characteristics by analyzing EVENTSKG data, which provides a flexible means for customization in order to better understand the characteristics of renowned CS events.
  • Item
    SemSur: A Core Ontology for the Semantic Representation of Research Findings
    (Amsterdam [u.a.] : Elsevier, 2018) Fathalla, Said; Vahdati, Sahar; Auer, Sören; Lange, Christoph; Fensel, Anna; de Boer, Victor; Pellegrini, Tassilo; Kiesling, Elmar; Haslhofer, Bernhard; Hollink, Laura; Schindler, Alexander
    The way how research is communicated using text publications has not changed much over the past decades. We have the vision that ultimately researchers will work on a common structured knowledge base comprising comprehensive semantic and machine-comprehensible descriptions of their research, thus making research contributions more transparent and comparable. We present the SemSur ontology for semantically capturing the information commonly found in survey and review articles. SemSur is able to represent scientific results and to publish them in a comprehensive knowledge graph, which provides an efficient overview of a research field, and to compare research findings with related works in a structured way, thus saving researchers a significant amount of time and effort. The new release of SemSur covers more domains, defines better alignment with external ontologies and rules for eliciting implicit knowledge. We discuss possible applications and present an evaluation of our approach with the retrospective, exemplary semantification of a survey. We demonstrate the utility of the SemSur ontology to answer queries about the different research contributions covered by the survey. SemSur is currently used and maintained at OpenResearch.org.