Kafka-ML: Connecting the data stream with ML/AI frameworks

Loading...
Thumbnail Image
Date
2022
Volume
126
Issue
Journal
Series Titel
Book Title
Publisher
Amsterdam [u.a.] : Elsevier Science
Abstract

Machine Learning (ML) and Artificial Intelligence (AI) depend on data sources to train, improve, and make predictions through their algorithms. With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data to continuous data streams. However, most of the ML/AI frameworks used nowadays are not fully prepared for this revolution. In this paper, we propose Kafka-ML, a novel and open-source framework that enables the management of ML/AI pipelines through data streams. Kafka-ML provides an accessible and user-friendly Web user interface where users can easily define ML models, to then train, evaluate, and deploy them for inferences. Kafka-ML itself and the components it deploys are fully managed through containerization technologies, which ensure their portability, easy distribution, and other features such as fault-tolerance and high availability. Finally, a novel approach has been introduced to manage and reuse data streams, which may eliminate the need for data storage or file systems.

Description
Keywords
Apache Kafka, Artificial Intelligence, Data streams, Distributed systems, Docker, Kafka-ML, Kubernetes, Machine Learning
Citation
Martín, C., Langendoerfer, P., Zarrin, P. S., Díaz, M., & Rubio, B. (2022). Kafka-ML: Connecting the data stream with ML/AI frameworks. 126. https://doi.org//10.1016/j.future.2021.07.037
Collections
License
CC BY 4.0 Unported