Tutorials
July 13, 2023
10 min read

Testing Machine Learning Classification models for fraud detection

This article explains how Giskard open-source ML framework can be used for testing ML models and applied to fraud detection. It explores the components of Giskard: the Python library, its user-friendly interface, its installation process, and practical implementation for banknote authentication. The article provides step-by-step guide, code snippets, and leverages the banknote authentication dataset to develop an accurate ML model.

Testing Classification Models for Fraud Detection with Giskard
Happiness Omale
Testing Classification Models for Fraud Detection with Giskard
Testing Classification Models for Fraud Detection with Giskard

🧐 Introduction: Testing and evaluating Machine Learning models with Giskard

In today's interconnected and digitized world, the authenticity and trustworthiness of banknotes are critical in maintaining financial systems stability. As technology continues to evolve, so do the techniques employed by fraudsters, making it increasingly challenging to detect fraudulent currency. Innovative solutions like Giskard have emerged to address this pressing concern, offering a cutting-edge approach to banknote authentication.

Giskard is an open-source Machine Learning library that allows you to quickly test your model to ensure there are no errors. It also serves as a bridge between data preprocessing and classification algorithms, enabling seamless integration and boosting the overall performance of ML models. It also enables you to uncover hidden patterns and enhance the predictive power of your models. Giskard's feature engineering prowess is particularly beneficial when working with high-dimensional datasets, where extracting relevant features becomes crucial for achieving accurate predictions.

In this article, we will examine the components of Giskard, including its user-friendly interface, also explore the installation and practical implementation of Giskard for banknote authentication, exploring its features and functionalities, we will also discuss the step-by-step procedure and share code snippets to illustrate how giskard can be used. 

Whether you are a data scientist looking to level up your ML models, a beginner eager to explore the potential of Giskard or an individual concerned with safeguarding your financial interests, understanding how to utilize Giskard will provide you with the necessary information to get started.

💶 Binary classification model use case: Banknote Authentication

We will work with the banknote authentication dataset for this article.

The banknote authentication data set is a collection of real and fake banknotes samples. The dataset aims to provide a reliable method for authenticating banknotes based on various features. The task is to develop a Machine Learning model that can accurately classify banknotes as genuine or fake based on certain features. Banknote authentication is a classification problem. It aims to predict the discrete labels 0 or 1, where 0 indicates a fake banknote and 1 indicates the banknote is real. The dataset can be found here.

Prerequisites

You will need to have Docker installed.

Install Giskard

To get started, run the following command on your terminal to install the giskard server on your computer. This sets up your Python backend.

Make sure you have a strong internet connection. When that is done, ensure the Docker container is running like this:

Output:

Giskard Docker container

NB: This image is taken from the Docker desktop

Once docker-compose starts all the modules, you will be able to open giskard at http://localhost:19000/

  • Log in to giskard
  • You will upload the licence sent to your email to be able to interact with the user interface.

📚 Installing Giskard’s Machine Learning Python library

Connect the external worker

Next, start the ML Worker, the component in giskard that connects your python environment to the Giskard server you just installed. It executes the model in your working Python environment (notebook, Python IDE, etc.) To start ML Worker, execute the following command on your terminal:

You will be asked to enter an API key which can be found at the setting tab of your Giskard:

Giskard API Access Token

Copy the API key and paste it on your terminal requiring the key.

Copy API key on your terminal

You should get an output like this:

Terminal output

After that, go to the settings of your Giskard in your browser and ensure that it is connected to the external worker.

ML worker

🏃 Training a classification model and uploading it to Giskard

Write the following code inside a Jupyter notebook

First import libraries

Read the dataset

Declare the type of each column in the dataset (example: category, numeric, text)

Train a classifier

Fit and score your model

Why Random Forest Classifier?

The Random forest classifier is a popular machine learning algorithm that is well-suited for various classification tasks, including the authentication of banknotes. Here are some reasons why the Random Forest classifier may be a good choice for the banknote authentication dataset:

  1. Ensemble Method: Random forest is an ensemble learning method that combines multiple decision trees to make predictions. Each tree is trained using a randomly selected sample of the data, and the final prediction is obtained by aggregating the predictions of individual trees. This ensemble approach improves the overall accuracy and generalization of the model.
  2. Robust to Overfitting: Random forest helps mitigate overfitting, which occurs when a model learns too much from the training data and fails to generalize well to unseen data. By using random subsets of the data and random subsets of features for each tree, the model reduces the risk of overfitting and provides more reliable performance on unseen data.
  3. Robust to Outliers: Random forest is less affected by outliers compared to some other classifiers, the impact of individual outliers is typically reduced, leading to more robust predictions.
  4. Easy to Use and Interpret: Random forest is relatively easy to implement and tune, making it a popular choice among practitioners. It also measures feature importance, allowing for a better understanding and interpretation of the model's behaviour.
  5. Feature Importance: Random forest measures feature importance, indicating which features have the most significant impact on the classification task. This information can be valuable for understanding banknotes’' underlying patterns and characteristics that contribute to their authenticity.
Random Forest classifier

Evaluating your Machine Learning model with Giskard

🔍 Scan your ML model to detect issues

Output:

Scan your ML model with Giskard

Let’s show the scan results:

Output:

Scan results of ML model

Using Giskard Machine Learning solution for fraud detection

As we see above, the model detected four vulnerabilities from the scan which includes: Performance bias, Overconfidence, Underconfidence and Spurious Correlation. Let’s discuss them and how they can affect banknote authentication.

Performance bias

The performance bias issue in machine learning refers to a situation where a model exhibits low performance on specific data slices or subsets, despite satisfactory performance on the overall dataset. Some factors that can cause performance bias include:

  • Data Imbalance: When the dataset contains imbalanced classes or unequal representation of different groups, the model may prioritize the majority class or dominant groups in its learning process. 
  • Biased Training Data: If the training data used to train the model contains inherent biases or reflects societal prejudices, the model may learn to reinforce these biases, resulting in performance bias.
  •  Model Complexity and Capacity: Models with high complexity or excessive capacity can overfit the majority class or dominant groups in the training data.

Performance bias vulnerabilities can affect banknote authentication in several ways. For example, if a Giskard model is biased towards certain types of bank notes, it may be more likely to authenticate those notes as real, even if they contain signs of forgery. The model may have been trained on a dataset disproportionately biased towards those notes.

Additionally, performance bias vulnerabilities can make it more difficult to detect new counterfeits that are designed to exploit the model's biases. This is because the model may be so accustomed to seeing certain types of banknotes that it will not even consider the possibility that a new counterfeit is real. For example, a model that is trained on a dataset of mostly US banknotes may be more likely to authenticate a fake US banknote than a fake note from another country.

It is essential to carefully select the training data for giskard models to mitigate the risk of performance bias vulnerabilities. This data should represent the range of real and fake banknotes the model is expected to encounter in the real world. Doing this makes it possible to ensure that the model is not biased toward any particular type of note.

Overconfidence

The overconfidence issue in machine learning refers to the phenomenon where a machine learning model produces predictions that are incorrect but are assigned high probabilities or confidence scores. This means the model is overly confident in its predictions, even when inaccurate. Some factors that can cause overconfidence include:

  • Data Bias: If the training data used to train the model contains inherent biases or lacks diversity, the model may not be exposed to a wide range of scenarios.
  • Overfitting: Overfitting occurs when a model becomes too complex and adapts too closely to the training data. As a result, the model may not generalize well to unseen data and may exhibit overconfidence in its predictions, even though they are inaccurate.
  • Imbalanced Classes: In classification tasks, imbalanced class distributions can lead to overconfident predictions. Suppose the model is trained on a dataset where one class is significantly more prevalent than others. In that case, it may assign high probabilities to predictions of the majority class, even when they are incorrect.

Overconfidence vulnerabilities can affect banknote authentication in several ways. For example, if a gkard model is overconfident in its predictions, it may be more likely to accept fake banknotes as real. The model may be less likely to flag a fake note as suspicious, even if it contains some signs of forgery.

Additionally, overconfidence vulnerabilities can make it more difficult to detect new fake note that are designed to fool the model. The model may be so confident in its predictions that it will not even consider the possibility of a real new fake note.

Underconfidence

The underconfidence issue for classification in machine learning refers to the phenomenon where a machine learning model produces predictions with low confidence, even when the actual label is highly likely. In underconfident predictions, the predicted label is very close to the probability of the next highest probability label. Some factors that can cause underconfidence include:

  • Insufficient Model Training: If the model is not adequately trained on diverse and representative data, it may lack the necessary information to make confident predictions. 
  • Imbalanced Classes: When there is a scarcity of examples or a significant class imbalance, the model may struggle to estimate probabilities, leading to underconfident predictions accurately.
  • Uncertain Data Characteristics: In scenarios where the input data contains inherent noise, ambiguity, or overlapping feature distributions, the model may find it challenging to make confident predictions. Uncertainty in the data can propagate into the model's output, causing underconfidence.

Underconfidence vulnerabilities can also affect banknote authentication. For example, if a model is underconfident in its predictions, it may be more likely to reject real banknotes as fake. This is because the model may be more likely to flag a real note as fake, even if it contains no signs of forgery.

Also, underconfidence vulnerabilities can make detecting new fake notes designed to fool the model more difficult. This is because the model may be so uncertain in its predictions that it will not even consider the possibility that a new fake note is real.

Spurious Correlation

Spurious correlation refers to a situation in machine learning where a feature and the model prediction appear statistically correlated. However, their relationship is coincidental or caused by external factors rather than a genuine causal or meaningful connection. Some factors that can cause spurious correlation include:

  •  Confounding Variables: Spurious correlations may arise when confounding variables influence both the predicted variable and the feature being considered. These variables can create an illusion of correlation between the feature and the prediction, even though they are not causally related to each other.
  • Data Noise: Spurious correlations can occur due to data noise or anomalies unrelated to the underlying problem. This noise may result from errors in data collection, measurement biases, data preprocessing issues, or other data-specific factors.
  • Random Chance: In some cases, spurious correlations can occur purely by chance. When working with large datasets or many features, the likelihood of finding coincidental correlations increases. These correlations are not meaningful but are simply random occurrences that can mislead model predictions.

Spurious correlation vulnerabilities can also affect banknote authentication. For example, if a model learns a spurious correlation between two features of bank notes, it may be more likely to authenticate a fake note with those features. This is because the model may be unable to distinguish between a real note with those features and a fake note that has been deliberately designed to have those features.

Also, spurious correlation vulnerabilities can make it more difficult to detect new fake notes that are designed to exploit the model's spurious correlations. This is because the model may be so focused on spurious correlations that it will not even consider the possibility that a new fake note is real. For example, a model trained on a dataset of bank notes scanned in different lighting conditions may learn a spurious correlation between the brightness of the note and its authenticity. This is because the brightness of the note may be affected by the lighting conditions in which it was scanned rather than its authenticity.

You can read more on the key vulnerabilities here.

📊 Generate a test suite for your classification model

Output:

Execute test suite
List of passed and failed tests

⬆ Upload your test suite to the Giskard server

  • When we run our giskard server, you’ll have the local host running, this is where the uploaded data will be found.
  • Next, you need to generate your API token in the settings tab of the giskard application.
Giskard server
  • You can choose the arguments you want for the following;

your_project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION"). In our case, it is bank_note_authentication = client.create_project("bank_note_authentication", "Bank Note Authentication", "Project to classify if a banknote is real or fake"). Note: "project_key" should be unique and in lowercase.

Output:

Your dataset and model will be uploaded to giskard and available at http://localhost:19000/main/projects/439/test-suite/440/overview

This URL http://localhost:19000/main/projects/439/test-suite/440/overview will take you to the Giskard interface.

Banknote project in Giskard server

You are all set to try giskard in action! You can now generate some test suite.

Giskard test catalog

Check for scan that failed:

Failed tests

Edit Parameters to check if the test suite will pass.

Passed tests

Applying data slices

Slices pre-generated by the scan feature:

Data slices generated by the scan

When the slice to apply is set to none:

Data slice set to none

As we see from the output, we have a prediction 1 because the model may have learned that all banknotes are real, this is because the model may have been trained on a dataset that only contains real banknotes.

To avoid this, it is important to use a variety of slices when testing giskard models. This will help to ensure that the model is not simply predicting 1 because it has been trained on a dataset of only real images.

When the slice to apply is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01:

Data slice set to curtosis

As we see from the output, we get the prediction 0 when the slice to apply on giskard is set to ‘curtosis’ < 0.190 AND ‘curtosis’ >= -3.349e-01 because there are no real banknotes in the training data that fall within this slice. This means that the model has not learned to associate this slice with real banknotes and therefore predicts 0, which is the label for fake banknotes.

When the slice to apply is set to ‘skewness’<1.678 AND ‘skewness’>=0.807:

Data slice set to skewness

We might get a prediction of 0 when we apply the slice ‘skewness’<1.678 AND ‘skewness’>=0.807 to a giskard model.

One possibility is that no data points in the model fall within the specified range of skewness values. This could be because the model was not trained on any data points with those skewness values or because the data points with those skewness values were filtered out during the training process.

When the slice to apply is set to ‘variance’ <-1.437e-01 AND ‘variance’>= -6.512e-01:

Data slice set to variance

We are getting a prediction of 0 when the slice to apply on giskard is set to ‘variance’ < -1.437e-01 AND ‘variance’>= -6.512e-01 because this slice does not contain any data points. This is because the range of values for variance in this slice is so small that no data points fall within it.

To fix this, you can either widen the range of values for variance in the slice or remove the slice altogether. If you widen the range of values, you will include more data points in the slice, which will give you a more accurate prediction. If you remove the slice altogether, you cannot identify any vulnerabilities in the model's input space for this particular type of attack.

✅ Conclusion

In this article, we have understood how to use giskard for banknote authentication. Also, we made sure the model was successfully deployed and worked well in production. I hope you find this helpful. Happy testing!