Search Results

Now showing 1 - 7 of 7
  • Item
    Call to action for global access to and harmonization of quality information of individual earth science datasets
    (Paris : CODATA, 2021) Peng, Ge; Downs, Robert R.; Lacagnina, Carlo; Ramapriyan, Hampapuram; Ivánová, Ivana; Moroni, David; Wei, Yaxing; Larnicol, Gilles; Wyborn, Lesley; Goldberg, Mitch; Schulz, Jörg; Bastrakova, Irina; Ganske, Anette; Bastin, Lucy; Khalsa, Siri Jodha S.; Wu, Mingfang; Shie, Chung-Lin; Ritchey, Nancy; Jones, Dave; Habermann, Ted; Lief, Christina; Maggio, Iolanda; Albani, Mirko; Stall, Shelley; Zhou, Lihang; Drévillon, Marie; Champion, Sarah; Hou, C. Sophie; Doblas-Reyes, Francisco; Lehnert, Kerstin; Robinson, Erin; Bugbee, Kaylin
    Knowledge about the quality of data and metadata is important to support informed decisions on the (re)use of individual datasets and is an essential part of the ecosystem that supports open science. Quality assessments reflect the reliability and usability of data. They need to be consistently curated, fully traceable, and adequately documented, as these are crucial for sound decision- and policy-making efforts that rely on data. Quality assessments also need to be consistently represented and readily integrated across systems and tools to allow for improved sharing of information on quality at the dataset level for individual quality attribute or dimension. Although the need for assessing the quality of data and associated information is well recognized, methodologies for an evaluation framework and presentation of resultant quality information to end users may not have been comprehensively addressed within and across disciplines. Global interdisciplinary domain experts have come together to systematically explore needs, challenges and impacts of consistently curating and representing quality information through the entire lifecycle of a dataset. This paper describes the findings of that effort, argues the importance of sharing dataset quality information, calls for community action to develop practical guidelines, and outlines community recommendations for developing such guidelines. Practical guidelines will allow for global access to and harmonization of quality information at the level of individual Earth science datasets, which in turn will support open science.
  • Item
    The relationship between the language of scientific publication and its impact in the field of public and collective health
    ([S.l.] : SciBiolMed. Org, 2021) Dos Santos, Solange Maria; Fraumann Grischa; Belli, Simone; Mugnaini, Rogerio
    The language of scientific publications is a crucial factor when seeking to reach an international audience, because it affects linguistic accessibility and the geographical reach of research results. English is the language of science and the fact that it can be understood by most readers represents an undeniable advantage. Moreover, the fact that a large proportion of Ibero-American research has been published in national languages, is often cited as one of the reasons for its limited exposure. The purpose of this study was to analyze the relationship between scientific output published in a native language and its degree of exposure and impact in the field of Public and Collective Health. This bibliometric study was carried out based on the scientific output data obtained from the most prolific countries that are members of the SciELO (Scientific Electronic Library Online) Network in Public and Collective Health, in the 2011-2018 period. The data was collected from the SciELO Citation Index database (SciELO CI), which was integrated into the larger WoS platform in 2014 and was chosen on account of its importance as one of the few regional indexes that is still scarcely used in studies of this nature. The data shows that Brazilian articles in Portuguese had the greatest citation impact on publications in its own language (48.7%), while its articles in English present practically the same impact (48.5%) on Portuguese publications, followed by 34.5% on Spanish publications. The impact on the national language is also significant in the case of both Mexican and Spanish publications, to whom the percentage of citing articles in Spanish, for documents cited in the same language, is higher than for documents cited in English (respectively 1.6 and 1.8). The same applies to Portuguese and US-American articles where, respectively 56.6% and 43.9% of the citing articles are in their native language. Cuban and Peruvian articles have more than 90% of their citing articles in the national language. In contrast, the USA and Brazil are countries that have a greater citation impact on other languages, especially when published in Spanish. The extent of exposure of a given language of the scientific publication varies per the country´s scientific output. In the case of Brazilian and US-American publications, including publications in the national languages of these countries, the effects on audiences in other languages can be measured by the citation impact. Furthermore, the degree of exposure of certain publications suggests that SciELO CI represents a useful database for evaluating local scientific output, and this can be observed, particularly, for publications in the national language.
  • Item
    Persistent Identification for Conferences
    (Paris : CODATA, 2022) Franken, Julian; Birukou, Aliaksandr; Eckert, Kai; Fahl, Wolfgang; Hauschke, Christian; Lange, Christoph
    Persistent identification of entities plays a major role in the progress of digitization of many fields. In the scholarly publishing realm there are already persistent identifiers (PID) for papers (DOI), people (ORCID), organisation (GRID, ROR), books (ISBN) but there is no generally accepted PID system for scholarly events such as conferences or workshops yet. This article describes the relevant use cases that motivate the introduction of persistent identifiers for conferences. The use cases were mainly derived from interviews, discussions with experts and their previous work. As primary stakeholders who are involved in the typical conference event life cycle researchers, conference organizers, and data consumers were identified. The resulting list of use cases illustrates how PIDs for conference events will improve the current situation for these stakeholders and help with problems they are facing today.
  • Item
    Replication and Refinement of an Algorithm for Automated Drusen Segmentation on Optical Coherence Tomography
    (Berlin : Springer Nature, 2020) Wintergerst, M.W.M.; Gorgi Zadeh, S.; Wiens, V.; Thiele, S.; Schmitz-Valckenberg, S.; Holz, F.G.; Finger, R.P.; Schultz, T.
    Here, we investigate the extent to which re-implementing a previously published algorithm for OCT-based drusen quantification permits replicating the reported accuracy on an independent dataset. We refined that algorithm so that its accuracy is increased. Following a systematic literature search, an algorithm was selected based on its reported excellent results. Several steps were added to improve its accuracy. The replicated and refined algorithms were evaluated on an independent dataset with the same metrics as in the original publication. Accuracy of the refined algorithm (overlap ratio 36–52%) was significantly greater than the replicated one (overlap ratio 25–39%). In particular, separation of the retinal pigment epithelium and the ellipsoid zone could be improved by the refinement. However, accuracy was still lower than reported previously on different data (overlap ratio 67–76%). This is the first replication study of an algorithm for OCT image analysis. Its results indicate that current standards for algorithm validation do not provide a reliable estimate of algorithm performance on images that differ with respect to patient selection and image quality. In order to contribute to an improved reproducibility in this field, we publish both our replication and the refinement, as well as an exemplary dataset.
  • Item
    Author Correction: Replication and Refinement of an Algorithm for Automated Drusen Segmentation on Optical Coherence Tomography (Scientific Reports, (2020), 10, 1, (7395), 10.1038/s41598-020-63924-6)
    ([London] : Macmillan Publishers Limited, part of Springer Nature, 2021) Wintergerst, Maximilian W. M.; Gorgi Zadeh, Shekoufeh; Wiens, Vitalis; Thiele, Sarah; Schmitz-Valckenberg, Steffen; Holz, Frank G.; Finger, Robert P.; Schultz, Thomas
    Correction to: Scientific Reports https://doi.org/10.1038/s41598-020-63924-6, published online 30 April 2020
  • Item
    The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    (London : Nature Publishing Group, 2023) Auer, Sören; Barone, Dante A.C.; Bartz, Cassiano; Cortes, Eduardo G.; Jaradeh, Mohamad Yaser; Karras, Oliver; Koubarakis, Manolis; Mouromtsev, Dmitry; Pliukhin, Dmitrii; Radyush, Daniil; Shilin, Ivan; Stocker, Markus; Tsalapati, Eleni
    Knowledge graphs have gained increasing popularity in the last decade in science and technology. However, knowledge graphs are currently relatively simple to moderate semantic structures that are mainly a collection of factual statements. Question answering (QA) benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs such as DBpedia and Wikidata. We present SciQA a scientific QA benchmark for scholarly knowledge. The benchmark leverages the Open Research Knowledge Graph (ORKG) which includes almost 170,000 resources describing research contributions of almost 15,000 scholarly articles from 709 research fields. Following a bottom-up methodology, we first manually developed a set of 100 complex questions that can be answered using this knowledge graph. Furthermore, we devised eight question templates with which we automatically generated further 2465 questions, that can also be answered with the ORKG. The questions cover a range of research fields and question types and are translated into corresponding SPARQL queries over the ORKG. Based on two preliminary evaluations, we show that the resulting SciQA benchmark represents a challenging task for next-generation QA systems. This task is part of the open competitions at the 22nd International Semantic Web Conference 2023 as the Scholarly Question Answering over Linked Data (QALD) Challenge.
  • Item
    Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets
    (Paris : CODATA, 2022) Peng, Ge; Lacagnina, Carlo; Downs, Robert R.; Ganske, Anette; Ramapriyan, Hampapuram K.; Ivánová, Ivana; Wyborn, Lesley; Jones, Dave; Bastin, Lucy; Shie, Chung-lin; Moroni, David F.
    Open-source science builds on open and free resources that include data, metadata, software, and workflows. Informed decisions on whether and how to (re)use digital datasets are dependent on an understanding about the quality of the underpinning data and relevant information. However, quality information, being difficult to curate and often context specific, is currently not readily available for sharing within and across disciplines. To help address this challenge and promote the creation and (re)use of freely and openly shared information about the quality of individual datasets, members of several groups around the world have undertaken an effort to develop international community guidelines with practical recommendations for the Earth science community, collaborating with international domain experts. The guidelines were inspired by the guiding principles of being findable, accessible, interoperable, and reusable (FAIR). Use of the FAIR dataset quality information guidelines is intended to help stakeholders, such as scientific data centers, digital data repositories, and producers, publishers, stewards and managers of data, to: i) capture, describe, and represent quality information of their datasets in a manner that is consistent with the FAIR Guiding Principles; ii) allow for the maximum discovery, trust, sharing, and reuse of their datasets; and iii) enable international access to and integration of dataset quality information. This article describes the processes that developed the guidelines that are aligned with the FAIR principles, presents a generic quality assessment workflow, describes the guidelines for preparing and disseminating dataset quality information, and outlines a path forward to improve their disciplinary diversity.