Search Results

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Item

What the Phage: a scalable workflow for the identification and analysis of phage sequences

2022, Marquet, Mike, Hölzer, Martin, Pletz, Mathias W, Viehweger, Adrian, Makarewicz, Oliwia, Ehricht, Ralf, Brandt, Christian

Phages are among the most abundant and diverse biological entities on earth. Phage prediction from sequence data is a crucial first step to understanding their impact on the environment. A variety of bacteriophage prediction tools have been developed over the years. They differ in algorithmic approach, results, and ease of use. We, therefore, developed "What the Phage"(WtP), an easy-to-use and parallel multitool approach for phage prediction combined with an annotation and classification downstream strategy, thus supporting the user's decision-making process by summarizing the results of the different prediction tools in charts and tables. WtP is reproducible and scales to thousands of datasets through a workflow manager (Nextflow). WtP is freely available under a GPL-3.0 license (https://github.com/replikation/What_the_Phage).

Loading...
Thumbnail Image
Item

Kafka-ML: Connecting the data stream with ML/AI frameworks

2022, Martín, Cristian, Langendoerfer, Peter, Zarrin, Pouya Soltani, Díaz, Manuel, Rubio, Bartolomé

Machine Learning (ML) and Artificial Intelligence (AI) depend on data sources to train, improve, and make predictions through their algorithms. With the digital revolution and current paradigms like the Internet of Things, this information is turning from static data to continuous data streams. However, most of the ML/AI frameworks used nowadays are not fully prepared for this revolution. In this paper, we propose Kafka-ML, a novel and open-source framework that enables the management of ML/AI pipelines through data streams. Kafka-ML provides an accessible and user-friendly Web user interface where users can easily define ML models, to then train, evaluate, and deploy them for inferences. Kafka-ML itself and the components it deploys are fully managed through containerization technologies, which ensure their portability, easy distribution, and other features such as fault-tolerance and high availability. Finally, a novel approach has been introduced to manage and reuse data streams, which may eliminate the need for data storage or file systems.