Machine learning algorithms are becoming pivotal in business decision-making; however, these models often require adjustments due to a phenomenon called 'Machine Learning Model Drift'. Numerous curricula, academic articles, and posts mention a machine learning (ML) lifecycle that spans from data gathering to launching the ML model, yet fail to tackle one crucial aspect: the model drift.
Throughout time, the relationship between the dependent and independent variables modifies. This change causes the model to destabilize and its predictions gradually become less accurate. So, how to cope with this situation?
Strategies to Combat Model Drift
A practical solution is to continuously refine these models. Previous experiences can help assess when the model starts to drift, allowing one to preemptively readjust the model, thus mitigating the drift risks.
When data fluctuates over time, data weighting could be a useful approach. For instance, financial models determining specific parameters based on recent transactions could include elements that grant more importance to newer transactions and less to older ones, maintaining model reliability and potential drift issues in control.
A more complex strategy to combat model drift is to model the change itself. Preserve the first-developed model and use it as a base. New models can then be devised to adjust the predictions of this baseline model according to the recent data changes.
Understanding the Fragility of ML Systems
ML systems are more fragile than they appear, necessitating testing, CI/CD, and monitoring. Is routine retraining of models necessary? The usual option involves continuous model retraining, but how frequent should this be?
At times, the problem reveals itself. While passively awaiting an issue isn't ideal, it's the only alternative for novel models lacking historical data to predict potential issues. When a problem arises, a thorough investigation can be conducted, and preventative measures can be adopted.
If the model data displays cyclical patterns, model retraining should coincide with these seasons. For instance, during festive spending times, credit lending institutions need specialized models to manage the sudden behavioral changes.
Monitoring for Model Drift
However, continuous monitoring for model drift in ML is the most effective method for detecting it. Model stability metrics need to be regularly checked, with the frequency of these checks varying according to the sector and business. Monitoring can be done manually or automatically, generating alerts whenever unexpected anomalies occur.
Categories of Model Drift
Model drifts fall into two categories: data drift versus concept drift. Concept drift happens when statistical properties of the target variable alter. This leads to the model functioning less efficiently when the variable definitions change.
The second and most common type is data drift, occurring when the predictor variables' statistical properties modify. This means a change in underlying variables results in inevitable model failure, often seen when data patterns change due to seasonality. For example, a business strategy effective in summer might fail in winter. Likewise, while airlines experience a surge in demand during holiday seasons, they struggle to maintain occupancy during off seasons.