Challenges in Live Monitoring of Machine Learning Systems
Keywords:
reliable machine learning, monitoring, production systems, feature distribution, non-stationarityAbstract
A machine learning (ML) system involves multiple layers of software and therefore needs monitoring to ensure a reliable operation. As opposed to traditional software services, the quality of its predictions can only be guaranteed if the data that flows into the system follows a similar distribution as the data the ML model was trained on. This poses additional requirements on monitoring. In this paper we outline a scheme for monitoring ML services based on feature distribution comparison between the data used for training and for live prediction. To showcase this we introduce payment risk prediction as an application scenario. Its long feedback delays and real time requirements motivate monitoring and at the same time holds specific challenges which we address. In this context we discuss trade-offs for the practical implementation of the monitoring scheme and share our best practices.
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Patrick Baier, Stanimir Dragiev

This work is licensed under a Creative Commons Attribution 4.0 International License.