MLOps Monitoring

Machine learning models fuel many vital business operations and maintaining their relevance with the most recent data when deployed is essential. If there's a data skew, the model may lose context due to different data distribution during training and deployment stages. One might also encounter situations where a particular characteristic is not available in the deployed data or change in the real-world environment makes the model inapplicable. The user behavior dynamics can also change.

Feedback mechanisms play a substantial role in various aspects of life, including business. The idea behind a feedback loop is plain - create, measure, and use that information to boost the output, essentially forming a cycle of analysis and enhancement. ML models can indispensably profit from this process.

A typical ML cycle unfolds with data collection, pre-processing, model creation, and assessment, finally culminating in deployment. But, an important component often left out is feedback. The principal aim of any model monitoring method is setting up this invaluable feedback loop from the deployment back to model development phase. This enables the ML model to develop over time, allowing it to decide whether an upgrade is needed or persisting with the existing model is adequate. To facilitate this decision-making, the model monitoring framework needs to conserve and disclose various model metrics under two circumstances:

When training data is furnished, the framework computes model metrics on both training and deployed data post-deployment to draw a suitable conclusion.
In scenarios with no available training data, the framework calculates the model metrics from the data available post-deployment only.

Depending on which circumstance applies to determine if production model needs an upgrade or interventions, metrics pointed out in the next segment are generated.

Measurements

The most effective AI model monitoring metrics are segregated into three categories based on their dependency on data and/or machine learning models. A machine learning model performance monitoring framework should ideally integrate one or two metrics from each category. If there are any tradeoffs, one could start with operations metrics, progressing as the model becomes more mature. Checking operational metrics should be done daily, if not real-time, while stability and model performance should be done weekly or even longer based on the business environment.

Three types of metrics could be used:

Stability Metrics: Utilized to detect two kinds of data distribution shift patterns.
Evaluation Metrics: Useful for identifying a conceptual shift in data.
Operational Metrics: Helpful in examining the effectiveness of the deployed model in terms of use.

Conclusion

Overlooking monitoring can lead to loss of "confidence" in the ML system, which might prove to be fatal. Hence, it's integral to include and prepare for it in the complete solution architecture for any ML use case application. Monitoring the MLOps lifecycle, MLOps pipelines, and the MLOps platform has become a requirement for mature ML systems. Developing such a framework is fundamental to ensure the consistency and robustness of the ML system.