🧐 Introduction: Testing and evaluating Machine Learning models with Giskard
In today's interconnected and digitized world, the authenticity and trustworthiness of banknotes are critical in maintaining financial systems stability. As technology continues to evolve, so do the techniques employed by fraudsters, making it increasingly challenging to detect fraudulent currency. Innovative solutions like Giskard have emerged to address this pressing concern, offering a cutting-edge approach to banknote authentication.
Giskard is an open-source Machine Learning library that allows you to quickly test your model to ensure there are no errors. It also serves as a bridge between data preprocessing and classification algorithms, enabling seamless integration and boosting the overall performance of ML models. It also enables you to uncover hidden patterns and enhance the predictive power of your models. Giskard's feature engineering prowess is particularly beneficial when working with high-dimensional datasets, where extracting relevant features becomes crucial for achieving accurate predictions.
In this article, we will examine the components of Giskard, including its user-friendly interface, also explore the installation and practical implementation of Giskard for banknote authentication, exploring its features and functionalities, we will also discuss the step-by-step procedure and share code snippets to illustrate how giskard can be used.
Whether you are a data scientist looking to level up your ML models, a beginner eager to explore the potential of Giskard or an individual concerned with safeguarding your financial interests, understanding how to utilize Giskard will provide you with the necessary information to get started.
💶 Binary classification model use case: Banknote Authentication
We will work with the banknote authentication dataset for this article.
The banknote authentication data set is a collection of real and fake banknotes samples. The dataset aims to provide a reliable method for authenticating banknotes based on various features. The task is to develop a Machine Learning model that can accurately classify banknotes as genuine or fake based on certain features. Banknote authentication is a classification problem. It aims to predict the discrete labels 0 or 1, where 0 indicates a fake banknote and 1 indicates the banknote is real. The dataset can be found here.
You will need to have Docker installed.
To get started, run the following command on your terminal to install the giskard server on your computer. This sets up your Python backend.
Make sure you have a strong internet connection. When that is done, ensure the Docker container is running like this:
NB: This image is taken from the Docker desktop
Once docker-compose starts all the modules, you will be able to open giskard at http://localhost:19000/
- Log in to giskard
- You will upload the licence sent to your email to be able to interact with the user interface.
📚 Installing Giskard’s Machine Learning Python library
Connect the external worker
Next, start the ML Worker, the component in giskard that connects your python environment to the Giskard server you just installed. It executes the model in your working Python environment (notebook, Python IDE, etc.) To start ML Worker, execute the following command on your terminal:
You will be asked to enter an API key which can be found at the setting tab of your Giskard:
Copy the API key and paste it on your terminal requiring the key.
You should get an output like this:
After that, go to the settings of your Giskard in your browser and ensure that it is connected to the external worker.
🏃 Training a classification model and uploading it to Giskard
Write the following code inside a Jupyter notebook
First import libraries
Read the dataset
Declare the type of each column in the dataset (example: category, numeric, text)
Train a classifier
Fit and score your model
Why Random Forest Classifier?
The Random forest classifier is a popular machine learning algorithm that is well-suited for various classification tasks, including the authentication of banknotes. Here are some reasons why the Random Forest classifier may be a good choice for the banknote authentication dataset:
- Ensemble Method: Random forest is an ensemble learning method that combines multiple decision trees to make predictions. Each tree is trained using a randomly selected sample of the data, and the final prediction is obtained by aggregating the predictions of individual trees. This ensemble approach improves the overall accuracy and generalization of the model.
- Robust to Overfitting: Random forest helps mitigate overfitting, which occurs when a model learns too much from the training data and fails to generalize well to unseen data. By using random subsets of the data and random subsets of features for each tree, the model reduces the risk of overfitting and provides more reliable performance on unseen data.
- Robust to Outliers: Random forest is less affected by outliers compared to some other classifiers, the impact of individual outliers is typically reduced, leading to more robust predictions.
- Easy to Use and Interpret: Random forest is relatively easy to implement and tune, making it a popular choice among practitioners. It also measures feature importance, allowing for a better understanding and interpretation of the model's behaviour.
- Feature Importance: Random forest measures feature importance, indicating which features have the most significant impact on the classification task. This information can be valuable for understanding banknotes’' underlying patterns and characteristics that contribute to their authenticity.
Evaluating your Machine Learning model with Giskard
🔍 Scan your ML model to detect issues
Let’s show the scan results:
Using Giskard Machine Learning solution for fraud detection
As we see above, the model detected four vulnerabilities from the scan which includes: Performance bias, Overconfidence, Underconfidence and Spurious Correlation. Let’s discuss them and how they can affect banknote authentication.
The performance bias issue in machine learning refers to a situation where a model exhibits low performance on specific data slices or subsets, despite satisfactory performance on the overall dataset. Some factors that can cause performance bias include:
- Data Imbalance: When the dataset contains imbalanced classes or unequal representation of different groups, the model may prioritize the majority class or dominant groups in its learning process.
- Biased Training Data: If the training data used to train the model contains inherent biases or reflects societal prejudices, the model may learn to reinforce these biases, resulting in performance bias.
- Model Complexity and Capacity: Models with high complexity or excessive capacity can overfit the majority class or dominant groups in the training data.
Performance bias vulnerabilities can affect banknote authentication in several ways. For example, if a Giskard model is biased towards certain types of bank notes, it may be more likely to authenticate those notes as real, even if they contain signs of forgery. The model may have been trained on a dataset disproportionately biased towards those notes.
Additionally, performance bias vulnerabilities can make it more difficult to detect new counterfeits that are designed to exploit the model's biases. This is because the model may be so accustomed to seeing certain types of banknotes that it will not even consider the possibility that a new counterfeit is real. For example, a model that is trained on a dataset of mostly US banknotes may be more likely to authenticate a fake US banknote than a fake note from another country.
It is essential to carefully select the training data for giskard models to mitigate the risk of performance bias vulnerabilities. This data should represent the range of real and fake banknotes the model is expected to encounter in the real world. Doing this makes it possible to ensure that the model is not biased toward any particular type of note.
The overconfidence issue in machine learning refers to the phenomenon where a machine learning model produces predictions that are incorrect but are assigned high probabilities or confidence scores. This means the model is overly confident in its predictions, even when inaccurate. Some factors that can cause overconfidence include:
- Data Bias: If the training data used to train the model contains inherent biases or lacks diversity, the model may not be exposed to a wide range of scenarios.
- Overfitting: Overfitting occurs when a model becomes too complex and adapts too closely to the training data. As a result, the model may not generalize well to unseen data and may exhibit overconfidence in its predictions, even though they are inaccurate.
- Imbalanced Classes: In classification tasks, imbalanced class distributions can lead to overconfident predictions. Suppose the model is trained on a dataset where one class is significantly more prevalent than others. In that case, it may assign high probabilities to predictions of the majority class, even when they are incorrect.
Overconfidence vulnerabilities can affect banknote authentication in several ways. For example, if a gkard model is overconfident in its predictions, it may be more likely to accept fake banknotes as real. The model may be less likely to flag a fake note as suspicious, even if it contains some signs of forgery.
Additionally, overconfidence vulnerabilities can make it more difficult to detect new fake note that are designed to fool the model. The model may be so confident in its predictions that it will not even consider the possibility of a real new fake note.
The underconfidence issue for classification in machine learning refers to the phenomenon where a machine learning model produces predictions with low confidence, even when the actual label is highly likely. In underconfident predictions, the predicted label is very close to the probability of the next highest probability label. Some factors that can cause underconfidence include:
- Insufficient Model Training: If the model is not adequately trained on diverse and representative data, it may lack the necessary information to make confident predictions.
- Imbalanced Classes: When there is a scarcity of examples or a significant class imbalance, the model may struggle to estimate probabilities, leading to underconfident predictions accurately.
- Uncertain Data Characteristics: In scenarios where the input data contains inherent noise, ambiguity, or overlapping feature distributions, the model may find it challenging to make confident predictions. Uncertainty in the data can propagate into the model's output, causing underconfidence.
Underconfidence vulnerabilities can also affect banknote authentication. For example, if a model is underconfident in its predictions, it may be more likely to reject real banknotes as fake. This is because the model may be more likely to flag a real note as fake, even if it contains no signs of forgery.
Also, underconfidence vulnerabilities can make detecting new fake notes designed to fool the model more difficult. This is because the model may be so uncertain in its predictions that it will not even consider the possibility that a new fake note is real.
Spurious correlation refers to a situation in machine learning where a feature and the model prediction appear statistically correlated. However, their relationship is coincidental or caused by external factors rather than a genuine causal or meaningful connection. Some factors that can cause spurious correlation include:
- Confounding Variables: Spurious correlations may arise when confounding variables influence both the predicted variable and the feature being considered. These variables can create an illusion of correlation between the feature and the prediction, even though they are not causally related to each other.
- Data Noise: Spurious correlations can occur due to data noise or anomalies unrelated to the underlying problem. This noise may result from errors in data collection, measurement biases, data preprocessing issues, or other data-specific factors.
- Random Chance: In some cases, spurious correlations can occur purely by chance. When working with large datasets or many features, the likelihood of finding coincidental correlations increases. These correlations are not meaningful but are simply random occurrences that can mislead model predictions.
Spurious correlation vulnerabilities can also affect banknote authentication. For example, if a model learns a spurious correlation between two features of bank notes, it may be more likely to authenticate a fake note with those features. This is because the model may be unable to distinguish between a real note with those features and a fake note that has been deliberately designed to have those features.
Also, spurious correlation vulnerabilities can make it more difficult to detect new fake notes that are designed to exploit the model's spurious correlations. This is because the model may be so focused on spurious correlations that it will not even consider the possibility that a new fake note is real. For example, a model trained on a dataset of bank notes scanned in different lighting conditions may learn a spurious correlation between the brightness of the note and its authenticity. This is because the brightness of the note may be affected by the lighting conditions in which it was scanned rather than its authenticity.
You can read more on the key vulnerabilities here.
📊 Generate a test suite for your classification model
⬆ Upload your test suite to the Giskard server
- When we run our giskard server, you’ll have the local host running, this is where the uploaded data will be found.
- Next, you need to generate your API token in the settings tab of the giskard application.
- You can choose the arguments you want for the following;
your_project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION"). In our case, it is bank_note_authentication = client.create_project("bank_note_authentication", "Bank Note Authentication", "Project to classify if a banknote is real or fake"). Note: "project_key" should be unique and in lowercase.
Your dataset and model will be uploaded to giskard and available at http://localhost:19000/main/projects/439/test-suite/440/overview
This URL http://localhost:19000/main/projects/439/test-suite/440/overview will take you to the Giskard interface.
You are all set to try giskard in action! You can now generate some test suite.
Check for scan that failed:
Edit Parameters to check if the test suite will pass.
Applying data slices
Slices pre-generated by the scan feature:
When the slice to apply is set to none:
As we see from the output, we have a prediction 1 because the model may have learned that all banknotes are real, this is because the model may have been trained on a dataset that only contains real banknotes.
To avoid this, it is important to use a variety of slices when testing giskard models. This will help to ensure that the model is not simply predicting 1 because it has been trained on a dataset of only real images.
When the slice to apply is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01:
As we see from the output, we get the prediction 0 when the slice to apply on giskard is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01 because there are no real banknotes in the training data that fall within this slice. This means that the model has not learned to associate this slice with real banknotes and therefore predicts 0, which is the label for fake banknotes.
When the slice to apply is set to ‘skewness’<1.678 AND ‘skewness’>=0.807:
We might get a prediction of 0 when we apply the slice ‘skewness’<1.678 AND ‘skewness’>=0.807 to a giskard model.
One possibility is that no data points in the model fall within the specified range of skewness values. This could be because the model was not trained on any data points with those skewness values or because the data points with those skewness values were filtered out during the training process.
When the slice to apply is set to ‘variance’ <-1.437e-01 AND ‘variance’>= -6.512e-01:
We are getting a prediction of 0 when the slice to apply on giskard is set to ‘variance’ < -1.437e-01 AND ‘variance’>= -6.512e-01 because this slice does not contain any data points. This is because the range of values for variance in this slice is so small that no data points fall within it.
To fix this, you can either widen the range of values for variance in the slice or remove the slice altogether. If you widen the range of values, you will include more data points in the slice, which will give you a more accurate prediction. If you remove the slice altogether, you cannot identify any vulnerabilities in the model's input space for this particular type of attack.
In this article, we have understood how to use giskard for banknote authentication. Also, we made sure the model was successfully deployed and worked well in production. I hope you find this helpful. Happy testing!