Search Results

Now showing 1 - 10 of 14
Loading...
Thumbnail Image
Item

Interaction Network Analysis Using Semantic Similarity Based on Translation Embeddings

2019, Manzoor Bajwa, Awais, Collarana, Diego, Vidal, Maria-Esther, Acosta, Maribel, Cudré-Mauroux, Philippe, Maleshkova, Maria, Pellegrini, Tassilo, Sack, Harald, Sure-Vetter, York

Biomedical knowledge graphs such as STITCH, SIDER, and Drugbank provide the basis for the discovery of associations between biomedical entities, e.g., interactions between drugs and targets. Link prediction is a paramount task and represents a building block for supporting knowledge discovery. Although several approaches have been proposed for effectively predicting links, the role of semantics has not been studied in depth. In this work, we tackle the problem of discovering interactions between drugs and targets, and propose SimTransE, a machine learning-based approach that solves this problem effectively. SimTransE relies on translating embeddings to model drug-target interactions and values of similarity across them. Grounded on the vectorial representation of drug-target interactions, SimTransE is able to discover novel drug-target interactions. We empirically study SimTransE using state-of-the-art benchmarks and approaches. Experimental results suggest that SimTransE is competitive with the state of the art, representing, thus, an effective alternative for knowledge discovery in the biomedical domain.

Loading...
Thumbnail Image
Item

Formalizing Gremlin pattern matching traversals in an integrated graph Algebra

2019, Thakkar, Harsh, Auer, Sören, Vidal, Maria-Esther, Samavi, Reza, Consens, Mariano P., Khatchadourian, Shahan, Nguyen, Vinh, Sheth, Amit, Giménez-García, José M., Thakkar, Harsh

Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ-ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex-ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys-tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match-ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra.

Loading...
Thumbnail Image
Item

The Research Core Dataset (KDSF) in the Linked Data context

2019, Walther, Tatiana, Hauschke, Christian, Kasprzik, Anna, Sicilia, Miguel-Angel, Simons, Ed, Clements, Anna, de Castro, Pablo, Bergström, Johan

This paper describes our efforts to implement the Research Core Dataset (“Kerndatensatz Forschung”; KDSF) as an ontology in VIVO. KDSF is used in VIVO to record the required metadata on incoming data and to produce reports as an output. While both processes need an elaborate adaptation of the KDSF specification, this paper focusses on the adaptation of the KDSF basic data model for recording data in VIVO. In this context, the VIVO and KDSF ontologies were compared with respect to domain, syntax, structure, and granularity in order to identify correspondences and mismatches. To produce an alignment, different matching approaches have been applied. Furthermore, we made necessary modifications and extensions on KDSF classes and properties.

Loading...
Thumbnail Image
Item

Linked Data Supported Content Analysis for Sociology

2019, Tietz, Tabea, Sack, Harald, Acosta, Maribel, Cudré-Mauroux, Philippe, Maleshkova, Maria, Pellegrini, Tassilo, Sack, Harald, Sure-Vetter, York

Philology and hermeneutics as the analysis and interpretation of natural language text in written historical sources are the predecessors of modern content analysis and date back already to antiquity. In empirical social sciences, especially in sociology, content analysis provides valuable insights to social structures and cultural norms of the present and past. With the ever growing amount of text on the web to analyze, also numerous computer-assisted text analysis techniques and tools were developed in sociological research. However, existing methods often go without sufficient standardization. As a consequence, sociological text analysis is lacking transparency, reproducibility and data re-usability. The goal of this paper is to show, how Linked Data principles and Entity Linking techniques can be used to structure, publish and analyze natural language text for sociological research to tackle these shortcomings. This is achieved on the use case of constitutional text documents of the Netherlands from 1884 to 2016 which represent an important contribution to the European cultural heritage. Finally, the generated data is made available and re-usable as Linked Data not only for sociologists, but also for all other researchers in the digital humanities domain interested in the development of constitutions in the Netherlands.

Loading...
Thumbnail Image
Item

Temporal Role Annotation for Named Entities

2018, Koutraki, Maria, Bakhshandegan-Moghaddam, Farshad, Sack, Harald, Fensel, Anna, de Boer, Victor, Pellegrini, Tassilo, Kiesling, Elmar, Haslhofer, Bernhard, Hollink, Laura, Schindler, Alexander

Natural language understanding tasks are key to extracting structured and semantic information from text. One of the most challenging problems in natural language is ambiguity and resolving such ambiguity based on context including temporal information. This paper, focuses on the task of extracting temporal roles from text, e.g. CEO of an organization or head of a state. A temporal role has a domain, which may resolve to different entities depending on the context and especially on temporal information, e.g. CEO of Microsoft in 2000. We focus on the temporal role extraction, as a precursor for temporal role disambiguation. We propose a structured prediction approach based on Conditional Random Fields (CRF) to annotate temporal roles in text and rely on a rich feature set, which extracts syntactic and semantic information from text. We perform an extensive evaluation of our approach based on two datasets. In the first dataset, we extract nearly 400k instances from Wikipedia through distant supervision, whereas in the second dataset, a manually curated ground-truth consisting of 200 instances is extracted from a sample of The New York Times (NYT) articles. Last, the proposed approach is compared against baselines where significant improvements are shown for both datasets.

Loading...
Thumbnail Image
Item

Why reinvent the wheel: Let's build question answering systems together

2018, Singh, K., Radhakrishna, A.S., Both, A., Shekarpour, S., Lytra, I., Usbeck, R., Vyas, A., Khikmatullaev, A., Punjani, D., Lange, C., Vidal, Maria-Esther, Lehmann, J., Auer, Sören

Modern question answering (QA) systems need to flexibly integrate a number of components specialised to fulfil specific tasks in a QA pipeline. Key QA tasks include Named Entity Recognition and Disambiguation, Relation Extraction, and Query Building. Since a number of different software components exist that implement different strategies for each of these tasks, it is a major challenge to select and combine the most suitable components into a QA system, given the characteristics of a question. We study this optimisation problem and train classifiers, which take features of a question as input and have the goal of optimising the selection of QA components based on those features. We then devise a greedy algorithm to identify the pipelines that include the suitable components and can effectively answer the given question. We implement this model within Frankenstein, a QA framework able to select QA components and compose QA pipelines. We evaluate the effectiveness of the pipelines generated by Frankenstein using the QALD and LC-QuAD benchmarks. These results not only suggest that Frankenstein precisely solves the QA optimisation problem but also enables the automatic composition of optimised QA pipelines, which outperform the static Baseline QA pipeline. Thanks to this flexible and fully automated pipeline generation process, new QA components can be easily included in Frankenstein, thus improving the performance of the generated pipelines.

Loading...
Thumbnail Image
Item

Preface

2019, Kaffee, Lucie-Aimee, Endris, Kemele M., Vidal, Maria-Esther, Comerio, Marco, Sadeghi, Mersedeh, Chaves-Fraga, David, Colpaert Pieter, Kaffee, Lucie Aimée, Endris, Kemele M., Vidal, María-Esther, Comerio, Marco, Sadeghi, Mersedeh, Chaves-Fraga, David, Colpaert, Pieter

This volumne presents the proceedings of the 1st International Workshop on Approaches for Making Data Interoperable (AMAR 2019) and 1st International Workshop on Semantics for Transport (Sem4Tra) held in Karlsruhe, Germany, September 9, 2019, co-located with SEMANTiCS 2019. Interoperability of data is an important factor to make transportation data accessible, therefore we present the topics alongside each other in this proceedings.

Loading...
Thumbnail Image
Item

A Case for Integrated Data Processing in Large-Scale Cyber-Physical Systems

2019, Glebke, René, Henze, Martin, Wehrle, Klaus, Niemietz, Philipp, Trauth, Daniel, Mattfeld, Patrick, Bergs, Thomas, Bui, Tung X.

Large-scale cyber-physical systems such as manufacturing lines generate vast amounts of data to guarantee precise control of their machinery. Visions such as the Industrial Internet of Things aim at making this data available also to computation systems outside the lines to increase productivity and product quality. However, rising amounts and complexities of data and control decisions push existing infrastructure for data transmission, storage, and processing to its limits. In this paper, we exemplarily study a fine blanking line which can produce up to 6.2 Gbit/s worth of data to showcase the extreme requirements found in modern manufacturing. We consequently propose integrated data processing which keeps inherently local and small-scale tasks close to the processes while at the same time centralizing tasks relying on more complex decision procedures and remote data sources. Our approach thus allows for both maintaining control of field-level processes and leveraging the benefits of “big data” applications.

Loading...
Thumbnail Image
Item

DoMoRe – A recommender system for domain modeling

2018, Agt-Rickauer, Henning, Kutsche, Ralf-Detlef, Sack, Harald, Hammoudi, Slimane, Ferreira Pires, Luis, Selic, Bran

Domain modeling is an important activity in early phases of software projects to achieve a shared understanding of the problem field among project participants. Domain models describe concepts and relations of respective application fields using a modeling language and domain-specific terms. Detailed knowledge of the domain as well as expertise in model-driven development is required for software engineers to create these models. This paper describes DoMoRe, a system for automated modeling recommendations to support the domain modeling process. We describe an approach in which modeling benefits from formalized knowledge sources and information extraction from text. The system incorporates a large network of semantically related terms built from natural language data sets integrated with mediator-based knowledge base querying in a single recommender system to provide context-sensitive suggestions of model elements.

Loading...
Thumbnail Image
Item

When humans and machines collaborate: Cross-lingual Label Editing in Wikidata

2019, Kaffee, L.-A., Endris, K.M., Simperl, E.

The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data.