Search Results

Now showing 1 - 2 of 2
  • Item
    Earth system data cubes unravel global multivariate dynamics
    (Göttingen : Copernicus Publ., 2020) Mahecha, Miguel D.; Gans, Fabian; Brandt, Gunnar; Christiansen, Rune; Cornell, Sarah E.; Fomferra, Normann; Kraemer, Guido; Peters, Jonas; Bodesheim, Paul; Camps-Valls, Gustau; Donges, Jonathan F.; Dorigo, Wouter; Estupinan-Suarez, Lina M.; Gutierrez-Velez, Victor H.; Gutwin, Martin; Jung, Martin; Londoño, Maria C.; Miralles, Diego G.; Papastefanou, Phillip; Reichstein, Markus
    Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency, and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model-data integration. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies, we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, and spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple timescales; and (3) model-data integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. In particular, we see many emerging perspectives of this approach for interpreting large-scale model ensembles. The latest developments in machine learning, causal inference, and model-data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
  • Item
    Kafka-ML: Connecting the data stream with ML/AI frameworks
    (Amsterdam [u.a.] : Elsevier Science, 2022) Martín, Cristian; Langendoerfer, Peter; Zarrin, Pouya Soltani; Díaz, Manuel; Rubio, Bartolomé
    Machine Learning (ML) and Artificial Intelligence (AI) depend on data sources to train, improve, and make predictions through their algorithms. With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data to continuous data streams. However, most of the ML/AI frameworks used nowadays are not fully prepared for this revolution. In this paper, we propose Kafka-ML, a novel and open-source framework that enables the management of ML/AI pipelines through data streams. Kafka-ML provides an accessible and user-friendly Web user interface where users can easily define ML models, to then train, evaluate, and deploy them for inferences. Kafka-ML itself and the components it deploys are fully managed through containerization technologies, which ensure their portability, easy distribution, and other features such as fault-tolerance and high availability. Finally, a novel approach has been introduced to manage and reuse data streams, which may eliminate the need for data storage or file systems.