Failure Analysis Machine Learning

Deploying Machine Learning Models in the Real World

In today's technologically advanced era, data scientists and machine learning specialists commonly develop, train, and review ML models using a specific dataset and mean accuracy measurements. However, utilizing such a model in actual usage situations instantly enhances expectations regarding the model's reliability and steadfastness. Models that seem to function well in the development phase may run into various issues in a real-life setting.

The Three Major Shortcomings

While creating the models, three critical shortcomings – performance bias, model malfunctions, and robustness glitches – are often dismissed by experts, and could cause issues in a live environment.

1. Performance Bias Issues:

Firstly, performance bias issues, which are regularly ignored in the testing process by data scientists. This bias, favoring certain groupings can go unnoticed with conventional techniques. There's extensive research inspecting the causes of bias, varying from faulty data analysis to problematic model structures. Here are some possible bias outcomes:

  • Subgroup Repercussions: There may be hidden long-term consequences for specific subsets. For example, if a model underperforms for a percentage of newly acquired users, this could lead to decreased engagement and customer retention over time.
  • Unexpected Data Subset Discrepancy: There may be underperformance instances due to reasons that are difficult to pinpoint. It’s important for the developer to be aware and investigate these cases.

2. Model Malfunctions Due to Data Pipelines:

Secondly, organizations wishing to implement machine learning must establish an effective data pipeline. The possibility of a hidden, negative impact on model performance and future data collection due to upstream data processing changes is a unique feature of such a pipeline. For instance, calculation changes by a data engineer may unintentionally affect the feature's distribution, or data pipeline changes may lead to misrepresentation of feature values.

Importantly, most common machine learning platforms lack the ability to verify data “validity” prior to delivering a result. Validity comes in various forms such as suitable numeric values, unaffected model due to type conversions, and the feature completeness.

3. Robustness Failures:

Lastly, we have robustness failures. These failures refer to the model's lack of resilience. Concerns stem from this kind of failure as disruptions could cause model errors, resulting in a negative user experience or the potential exploitation by malicious forces. Additionally, robustness issues are often linked with sudden output space changes.


In conclusion, assessing the readiness of a machine learning model for deployment should include thorough examination of performance bias, model malfunctions, and robustness failures. Unfortunately, these are frequently overlooked in most data scientists' practices and tools.

Integrate | Scan | Test | Automate

Detect hidden vulnerabilities in ML models, from tabular to LLMs, before moving to production.