October 24, 2023
10 min read

Mastering ML Model Evaluation with Giskard: From Validation to CI/CD Integration

Learn how to integrate vulnerability scanning, model validation, and CI/CD pipeline optimization to ensure reliability and security of your AI models. Discover best practices, workflow simplification, and techniques to monitor and maintain model integrity. From basic setup to more advanced uses, this article offers invaluable insights to enhance your model development and deployment process.

Sagar Thacker

Imagine effortlessly ensuring fairness, performance, and reliability in your machine learning models. Have you ever wondered how to guarantee that your AI models perform flawlessly and ethically? Enter Giskard.

Giskard is an open-source tool designed for data scientists and machine learning engineers. It's your key to detecting and addressing potential issues in your models. In this tutorial, we'll show you how to harness the power of Giskard:

  • Discover hands-on techniques to detect potential issues.
  • Learn how to generate test suites using Giskard.
  • Explore the seamless integration of Giskard into your CI/CD pipeline.

How to Install Giskard for ML Model Testing

Setting up a virtual environment

Isolating our project dependencies is essential. Let's start by creating a virtual environment.

  1. Create a Project Directory: Begin by creating a new directory for your project and navigating to it:
  1. Install `pipenv`: We'll use `pipenv` for managing dependencies. Install it using `pip`:
  1. Create a Virtual Environment: Now, create a virtual environment and install the required Python version (in this case, Python 3.9):

This command generates a `Pipfile` and `Pipfile.lock` to manage your project's dependencies.

Installing Giskard

With our virtual environment in place, let's proceed to install Giskard and its dependencies.

1. Install Giskard with `pipenv`: Inside your project directory, run:

This command ensures Giskard is installed within your virtual environment. For more detailed installation instructions, refer to the Giskard documentation.

2. Alternative Installation: If you prefer not to use virtual environments, you can also install Giskard directly with `pip`:

Data and Model Preparation for Machine Learning Validation

For our analysis, we've selected the Telecom Customer Churn Prediction dataset from Kaggle. Why? Because it's a compact yet diverse dataset, with its blend of numeric and categorical features.

Before we begin to scan our model for potential issues, we'll need to prepare our dataset and model for use with Giskard.

Preparing the dataset

Giskard requires that the dataset be wrapped with `Giskard.Dataset`. Some pointers to keep in mind when wrapping the dataset:

  1. Dataset Type: Ensure your dataset is a `pandas.DataFrame`.
  2. Include Ground Truth: Your dataset should contain the actual ground truth variable (the target variable).
  3. Use Raw Data: Giskard is designed to detect model issues, not data issues. So, use raw data to avoid confusing model issues with preprocessing artifacts.

Recommended Preprocessing Steps

While Giskard focuses on model issues, a few preprocessing steps can enhance dataset reliability:

  • Remove Duplicates: Get rid of duplicate entries.
  • Drop Redundant Features: Eliminate unnecessary features. For example, columns like Id might not be needed.
  • Specify Data Types: Specify the type for each column.
  • Split the Dataset: Divide it into training, validation, and test sets.
  • Handle NaN Values: Decide whether to fill them or remove them. [Optional]

The reason for splitting the dataset is to train the model on the training dataset, use the validation dataset to find potential issues with the model using Giskard. Lastly, test model performance on test dataset.

Here's a step-by-step example using a Customer Churn Prediction dataset:

To find out more about wrapping a dataset, check out the Giskard documentation.

Preparing the model for evaluation

Just like the dataset, Giskard requires your model to be wrapped with `Giskard.Model`. Giskard is model-agnostic, supporting machine learning models from various frameworks such as TensorFlow, PyTorch, and scikit-learn.

First things first, ensure your model is trained on the training set. A trained model is crucial as Giskard identifies issues based on the model's performance.

For this tutorial, let's use a simple logistic regression model from scikit-learn:

There are two ways to wrap a model:

  1. Prediction Function: Create a function that takes a `pandas.DataFrame` as input and returns a `numpy.ndarray` of prediction probabilities.
  2. Model Object: Define a custom class that inherits from Giskard's Model and implements the `model_predict` method. This method should take a `pandas.DataFrame` as input and return a `numpy.ndarray` of prediction probabilities.

In both cases, your function or method should encapsulate all data preprocessing steps, such as categorical encoding and numerical scaling. Why is this necessary will be explained shortly below.

For this tutorial, we used a sklearn pipeline to incorporate the preprocessing steps.

Here's an example of wrapping a model using the prediction function method:

Let's summarize, we've prepared a raw validation dataset and trained a model on the training dataset. On the training dataset, we applied some preprocessing techniques to ensure the model trains properly.

We aim to use our wrapped model on the wrapped dataset to spot potential issues. To predict accurately on this dataset, the model must apply the same preprocessing steps.

Behind the scenes, Giskard will call the ‘prediction_function’ with the raw data, apply the pre-processing steps, and then obtain the probabilities of the predictions.

Why is this important? Giskard uses the predicted probabilities to perform statistical tests. These tests help identify areas in the dataset where the model may have issues. Giskard then relates these issues back to the corresponding segments of the raw dataset. We'll see this more clearly when we look at the scan results.

To find out more about wrapping a model, check out the Giskard documentation.

Scan your ML model to detect vulnerabilities

Now that we've wrapped our dataset and model with Giskard, it's time to embark on the exciting journey of scanning the model for potential issues.

Giskard simplifies the scanning process with its `scan` function. Here's how you can use it:

Giskard Scan results:

Scan Results - Performance

Interpreting the scan results is crucial. Giskard has identified issues related to Performance Bias, Overconfidence, and Underconfidence.

Show Details - Performance

For instance, in the Performance Bias category: When the Contract is "One year", the Recall is 100.0% lower than the overall Recall. This "global recall" refers to the recall score for the entire dataset. In this subset, 215 samples have the actual label as Yes, but the model predicts No for all of them.

Show Details - Performance

In the Overconfidence category, take the first issue as an example. For samples where OnlineBackup is "Yes", the Overconfidence rate is 36.2%, compared to a global rate of 26.2%. This means that samples labeled 'Yes' are incorrectly predicted as 'No' with high confidence.

It's beneficial to delve deeper into the scan results to get a better grasp on the model's challenges.

Saving Your Scan Results

You might want to keep a record of your model's check-up. Giskard allows you to save the results in different formats:

The scan function in Giskard is designed to detect potential issues in machine learning models and datasets. These issues include:

  • Performance Bias: Occurs when a model performs differently on specific data subsets compared to the overall dataset.
  • Unrobustness: The model is sensitive to small changes in input data, leading to unpredictable behavior.
  • Overconfidence: The model assigns high confidence to incorrect predictions, potentially causing erroneous decisions.
  • Underconfidence: The model lacks confidence in its predictions, leading to cautious decision-making.
  • Unethical Behavior: When models exhibit sensitivity to gender, ethnicity, religion, or other factors in their predictions.
  • Data Leakage: Occurs when external information unintentionally influences model creation, leading to inaccurate generalization.
  • Stochasticity: The model produces different results for the same input due to inherent randomness in certain algorithms.
  • Spurious Correlation: When a feature appears correlated with model predictions, but the relationship is coincidental rather than meaningful.

These issues can impact the reliability and fairness of machine learning models, and Giskard helps identify and address them. You can learn more about these vulnerabilities in the Giskard documentation.

The Advantages of Test Suites for ML Model Monitoring

The scan function in Giskard helps you spot issues in your model. But how do you ensure these issues are addressed in subsequent versions of the model? This is where test suites come in.

Imagine you've developed a model, Model A, and used Giskard to scan it, revealing 15 issues. To address these, you can set up test suites for each issue. When you later retrain or adjust Model A, you can run these test suites to check if the identified issues have been resolved.

Understanding Test Suites for Effective Machine Learning Validation

Creating test suites in Giskard is straightforward. Think of a test suite as a collection of tests, each focusing on a specific model issue. This ensures a thorough verification of your model, leaving no stone unturned. Giskard offers a library of pre-made tests to make this process even easier.

Creating Your First Test Suite

Wouldn't it be fantastic if you could assemble a test suite that covers all the essential tests identified during the initial model scan? Well, you can! Here's how:

Adding Custom Tests

But what if you want to include a specific test in your suite? Suppose you need a test to check if the validation set's accuracy exceeds 0.8. Here's how you can do it:

Giskard Test Suites Result:

Test Suite Results

Extracting Test Suite Results

Now, what if you need to access the results from your test suite? Let's dive into the process:

Why would you want to extract these results? Doing so allows you to save them, integrate them into a CI/CD pipeline, or build a dynamic dashboard to visualize your model's performance over time.

To delve deeper into test suites and explore the diverse range of tests Giskard offers, check out the Giskard documentation on Tests and Test Suites.

Enhancing AI Model Validation: Integrating Giskard into CI/CD Pipelines

With our test suite ready, the next step is to incorporate it into a CI/CD pipeline. This allows for automated checks and balances every time there's an update to our model.

In this section, we'll walk you through integrating Giskard with GitHub Actions to create a CI/CD pipeline. This ensures that for every pull request, stakeholders and reviewers receive a concise summary of the model's performance.

What's the advantage? This snapshot not only offers a performance overview but also confirms that the model meets established quality benchmarks before progressing to the next stage.

Save the model and dataset

Before we dive into the pipeline setup, let's ensure we have our model and dataset safely stored.

Building a Python script

Next up, let's craft a Python script that will orchestrate the test suite and record its results. We'll name the file `run_test_suite.py`:

Creating a GitHub workflow YAML file

Turning our attention to the heart of our CI/CD pipeline, we're going to craft a GitHub workflow YAML file that orchestrates the entire process. We'll name this file ci-cd.yml.

To get started, follow these simple steps:

Step 1: Create the .github/workflows Directory

Open your terminal and execute these commands:

Step 2: Add the following code to the `ci-cd.yml` file:

Review Your Directory Structure

Before we proceed, let's take a quick look at the snapshot of your directory structure:

Pushing the files to GitHub

The next step involves pushing our project files to GitHub. Follow these commands to make it happen:

Creating a pull request

Now comes the exciting part! With our files on GitHub, it's time to create a pull request. This action will trigger the GitHub workflow, launching our test suite. The results will then be elegantly presented as a comment on the pull request.

Voila! We've seamlessly integrated Giskard into our CI/CD pipeline, streamlining the process of testing and validating your machine learning models.

With this setup, you'll have the power to ensure the reliability and quality of your models at every turn. And all it takes is a pull request to get the ball rolling!

Snapshots of Triggered Workflow and PR comment:

Snapshots - Triggered Workflow
Snapshots - PR comment

Giskard's Role in Evaluating Machine Learning Models in Real-World Scenarios

Having covered model scanning, test suite creation, and CI/CD pipeline integration with Giskard, let's delve into its practical applications in real-world situations.

Ensuring Model Integrity and Security

Giskard plays a pivotal role in ensuring both the performance and security of your model:

  • Model Scanning: Begin by analyzing your model with Giskard and create a corresponding test suite.
  • Performance Testing: Use the test suite to thoroughly assess the model's capabilities and vulnerabilities.

Continuous Monitoring and Data-Driven Insights

Maintaining consistent model performance and data quality is essential. Here's how Giskard assists:

  • Routine Model Checkups: Even if your model passes all tests, remember to periodically retrain and re-scan it for consistent high performance.
  • Build Dashboards to Visual Insights: Convert the results from the test suite into visual dashboards. This helps in easy tracking and pinpointing of issues.


In this tutorial, we've explored Giskard's capabilities to enhance your machine learning endeavors. We've looked at issue detection, test suite creation, and CI/CD pipeline integration.

We encourage you to further explore Giskard and see how it can improve your model validation and testing processes.

If you found this helpful, consider giving us a star on Github and becoming part of our Discord community. We appreciate your feedback and hope Giskard becomes an indispensable tool in your quest to create superior ML models.