Challenges in Live Monitoring of Machine Learning Systems

Patrick Baier; Stanimir Dragiev

Authors

Patrick Baier Hochschule Karlsruhe - University of Applied Sciences
Stanimir Dragiev Zalando Payments

Keywords:

reliable machine learning, monitoring, production systems, feature distribution, non-stationarity

Abstract

A machine learning (ML) system involves multiple layers of software and therefore needs monitoring to ensure a reliable operation. As opposed to traditional software services, the quality of its predictions can only be guaranteed if the data that flows into the system follows a similar distribution as the data the ML model was trained on. This poses additional requirements on monitoring. In this paper we outline a scheme for monitoring ML services based on feature distribution comparison between the data used for training and for live prediction. To showcase this we introduce payment risk prediction as an application scenario. Its long feedback delays and real time requirements motivate monitoring and at the same time holds specific challenges which we address. In this context we discuss trade-offs for the practical implementation of the monitoring scheme and share our best practices.

Challenges in Live Monitoring of Machine Learning Systems

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Language