Modeling self-service machine-learning agents for distributed stream processing

Resource type
P. Zehnder and D. Riemer
Book title
2017 IEEE International Conference on Big Data (Big Data)
Many emerging application areas such as the Internet of Things rely on the usage of Machine Learning (ML) techniques that operate on continuous streaming data in order to automatically generate added business value. However, the integration of ML components into stream processing programs is still a challenging task that requires both technical expertise (e.g., selection of algorithms and parameters) and domain knowledge. Therefore, this paper introduces a novel development methodology and tool support involving multiple roles and tasks that aims to enable domain experts to leverage from ML algorithms within self-service stream processing pipelines. At its core, we introduce the concept of a reusable Machine Learning Pipeline Agent (MLPA). Such an MLPA can be defined by using a developed Domain-Specific Language (DSL) consisting of reusable building blocks. Afterwards, MLPAs can be used by domain experts using a graphical user interface. Our approach is based on distributed, scalable ML algorithms that are automatically deployed and contributes newly developed algorithms such as a scalable version for neural networks operating on data streams. In our evaluation, we show that the developed concepts and algorithms do scale in a distributed setting with an increasing number of nodes. Therefore, our solution bridges the gap between domain experts and technical experts and eases the access for non-technical users to advanced streaming ML analytics.
Download .bib
Download .bib
Published by
Philipp Zehnder