G
News
June 3, 2022
7 min read

Giskard's new feature: Automated Machine Learning Testing

The Open Beta of Giskard's AI Test feature: an automated way to test your ML models and ensure performance, robustness, and ethics

Happy ML Tester
Alex Combessie
Happy ML Tester
Happy ML Tester

"Captain's Log, Stardate 2022.5. Our allies, the AI Innovators, are under attack by ML Bugs. These aliens are sabotaging AI development cycles. We are beaming down to give them our secret weapon, the ML Testing Engine!”

This month, the Giskard crew has been cooking something very special…

You have shared your stories of building ML products with us. You have told us about your use cases, your technical stack, and your goals. We have also heard your challenges: friction between ML and business experts, time-consuming ML development cycles.

Today, we are excited to announce the Open Beta release of our new feature: ML Testing! This feature will help you test your ML models thoroughly and automatically, to ensure performance, robustness and ethics.

You can try it right now on our GitHub repository: gisk.ar/github!

💡 Why ML Testing?

We met a lot of Heads of Data Science and ML Engineers in the last months. We realized that deploying a new ML model version involves a lot of repetitive manual actions. Besides, it is very hard to measure the added value of updating an ML model. The risk of regression, error and bias creates uncertainty when deploying a new version.

To solve this challenge, ML Engineers need testing solutions to integrate ML models in IT production systems in a safe and automated way.

This process is related to Continuous Integration in software engineering. However, applying this to ML models is a hard, unsolved problem:

Testing ML models is hard
The many challenges of testing ML models

The promise of ML Testing is simple: Deliver ML products, better & faster.

🔬 How to test ML models

A GIF is worth a thousand words:

Test your ML model in seconds
Giskard AI Test Demo

Yes, testing ML models should be as fast & simple as that!

⚡️ Performance

As data scientists, you are all used to computing performance metrics on a test set. Unfortunately, these global performance metrics are not enough. Performance can vary a lot depending on data slices. For instance, is your scoring model as performant on new customers as it is on current ones?

 ✅ Giskard provides a whole set of performance metrics that can be computed on sensitive data slices, automatically.

🐢 Robustness

Performance metrics are important but, unfortunately, ground truth value is not always available to compute them. This is why we provide the best data & prediction drift tests.

For categorical variables, we use the Population Stability Index (PSI) and the Chi-Square test. For numerical features, we use Kolmogorov-Smirnov test and Earthmover's distance. You can learn more about how we chose these metrics on our tech blog.

✅  Giskard provides pre-configured drift tests with binary thresholds, backed by State-of-the-Art ML research.

⚖️  Ethics

Technical metrics are important, but you also need to tackle the issue of bias. This is why we provide heuristic and metamorphic tests. These tests generalize metrics built in the FairML research community to measure and fight biases in Machine Learning.

Drawing from latest research (Ribeiro, 2020), we also provide behavioral tests to check the invariance & direction of your model when faced with data perturbations. This is key in NLP, as you need your model to be invariant to typos or synonyms. It is also very useful for counterfactual fairness metrics. You want your model to have a similar behaviour between protected minorities and the whole group.

✅  Giskard provides ready-made tests to integrate ethics guidelines and domain knowledge into your ML workflow.

This automated test suite is a great baseline for you to get started quickly. You can also write your own tests, and contribute them to the community!

📍 What's next?

This new feature is available for you right now in Open Beta. We are working very hard to to prepare the public release by the end of June.

Many thanks to Gaetan from Continuity for being the first! to report a bug on this Beta. Your feedback is essential, so please report any bugs you find and request features on our community: gisk.ar/discord

I would like to thanks all 1288 of you for following our company on LinkedIn. It is an amazing journey, 3 months after our initial release. You are our heroes!

To stay up to date with our latest news and support our mission, you can:

💌  Subscribe to this newsletter

💻 Get started with Giskard

🌠 Star us on GitHub

Integrate | Scan | Test | Automate

Giskard: Testing & evaluation framework for LLMs and AI models

Automatic LLM testing
Protect agaisnt AI risks
Evaluate RAG applications
Ensure compliance

Giskard's new feature: Automated Machine Learning Testing

The Open Beta of Giskard's AI Test feature: an automated way to test your ML models and ensure performance, robustness, and ethics
Learn more

"Captain's Log, Stardate 2022.5. Our allies, the AI Innovators, are under attack by ML Bugs. These aliens are sabotaging AI development cycles. We are beaming down to give them our secret weapon, the ML Testing Engine!”

This month, the Giskard crew has been cooking something very special…

You have shared your stories of building ML products with us. You have told us about your use cases, your technical stack, and your goals. We have also heard your challenges: friction between ML and business experts, time-consuming ML development cycles.

Today, we are excited to announce the Open Beta release of our new feature: ML Testing! This feature will help you test your ML models thoroughly and automatically, to ensure performance, robustness and ethics.

You can try it right now on our GitHub repository: gisk.ar/github!

💡 Why ML Testing?

We met a lot of Heads of Data Science and ML Engineers in the last months. We realized that deploying a new ML model version involves a lot of repetitive manual actions. Besides, it is very hard to measure the added value of updating an ML model. The risk of regression, error and bias creates uncertainty when deploying a new version.

To solve this challenge, ML Engineers need testing solutions to integrate ML models in IT production systems in a safe and automated way.

This process is related to Continuous Integration in software engineering. However, applying this to ML models is a hard, unsolved problem:

Testing ML models is hard
The many challenges of testing ML models

The promise of ML Testing is simple: Deliver ML products, better & faster.

🔬 How to test ML models

A GIF is worth a thousand words:

Test your ML model in seconds
Giskard AI Test Demo

Yes, testing ML models should be as fast & simple as that!

⚡️ Performance

As data scientists, you are all used to computing performance metrics on a test set. Unfortunately, these global performance metrics are not enough. Performance can vary a lot depending on data slices. For instance, is your scoring model as performant on new customers as it is on current ones?

 ✅ Giskard provides a whole set of performance metrics that can be computed on sensitive data slices, automatically.

🐢 Robustness

Performance metrics are important but, unfortunately, ground truth value is not always available to compute them. This is why we provide the best data & prediction drift tests.

For categorical variables, we use the Population Stability Index (PSI) and the Chi-Square test. For numerical features, we use Kolmogorov-Smirnov test and Earthmover's distance. You can learn more about how we chose these metrics on our tech blog.

✅  Giskard provides pre-configured drift tests with binary thresholds, backed by State-of-the-Art ML research.

⚖️  Ethics

Technical metrics are important, but you also need to tackle the issue of bias. This is why we provide heuristic and metamorphic tests. These tests generalize metrics built in the FairML research community to measure and fight biases in Machine Learning.

Drawing from latest research (Ribeiro, 2020), we also provide behavioral tests to check the invariance & direction of your model when faced with data perturbations. This is key in NLP, as you need your model to be invariant to typos or synonyms. It is also very useful for counterfactual fairness metrics. You want your model to have a similar behaviour between protected minorities and the whole group.

✅  Giskard provides ready-made tests to integrate ethics guidelines and domain knowledge into your ML workflow.

This automated test suite is a great baseline for you to get started quickly. You can also write your own tests, and contribute them to the community!

📍 What's next?

This new feature is available for you right now in Open Beta. We are working very hard to to prepare the public release by the end of June.

Many thanks to Gaetan from Continuity for being the first! to report a bug on this Beta. Your feedback is essential, so please report any bugs you find and request features on our community: gisk.ar/discord

I would like to thanks all 1288 of you for following our company on LinkedIn. It is an amazing journey, 3 months after our initial release. You are our heroes!

To stay up to date with our latest news and support our mission, you can:

💌  Subscribe to this newsletter

💻 Get started with Giskard

🌠 Star us on GitHub

Get Free Content

Download our AI Security Guide and learn everything you need to know about key AI security concepts.