All Knowledge

The Giskard hub

RAG Benchmarking: Comparing RAGAS, BERTScore, and Giskard for AI Evaluation

Discover the best tools for benchmarking Retrieval-Augmented Generation (RAG) systems. Compare RAGAS, BERTScore, Levenshtein Distance, and Giskard with real-world examples and find the optimal evaluation approach for your AI applications.

View post
Testing LLM Agents through continuous Red Teaming

How to implement LLM as a Judge to test AI Agents? (Part 2)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this second part of our tutorial, we'll explore how to automate test execution and implement continuous red teaming for LLM agents. Learn to systematically evaluate agentic AI systems, interpret results, and maintain security through ongoing testing as your AI application evolves.

View post
Implementing LLM as a Judge to test AI agents

How to implement LLM as a Judge to test AI Agents? (Part 1)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this first part of our tutorial, we introduce a systematic approach using LLM as a judge to detect hallucinations and security vulnerabilities before deployment. Learn how to generate synthetic test data and implement business annotation processes for exhaustive AI agent testing.

View post
Facial Landmark Detection for L'Oréal

L'Oréal leverages Giskard for advanced Facial Landmark Detection

L'Oréal has partnered with Giskard to enhance its AI models for Facial Landmark Detection. The collaboration focuses on evaluating and comparing various AI models using metrics such as Normalized Mean Error, prediction time, and robustness against image perturbations. It aims to improve the accuracy and reliability of L'Oréal's online services, ensuring superior performance across diverse facial regions and head poses. Co-authors: Alexandre Bouchez (L'Oréal), and Mathieu Martial (Giskard).

View post
Giskard and Grafana for data drift monitoring

Giskard + Grafana for Data Drift Monitoring

Learn how to monitor and visualize data drift using Giskard and Grafana in this guide. Perfect for generating intuitive visual representations, this tutorial takes you through the essential steps of setting up Grafana dashboards and integrating Giskard for effective data drift testing and visualization.

View post
Data Drift Monitoring with Giskard

Data Drift Monitoring with Giskard

Learn how to effectively monitor and manage data drift in machine learning models to maintain accuracy and reliability. This article provides a concise overview of the types of data drift, detection techniques, and strategies for maintaining model performance amidst changing data. It provides data scientists with practical insights into setting up, monitoring, and adjusting models to address data drift, emphasising the importance of ongoing model evaluation and adaptation.

View post
Build and evaluate a Customer Service Chatbot. Image generated by DALL-E

How to find the best Open-Source LLM for your Customer Service Chatbot

Explore how to use open-source Large Language Models (LLMs) to build AI customer service chatbots. We guide you through creating chatbots with LangChain and HuggingFace libraries, and how to evaluate their performance and safety using Giskard's testing framework.

View post

Mastering ML Model Evaluation with Giskard: From Validation to CI/CD Integration

Learn how to integrate vulnerability scanning, model validation, and CI/CD pipeline optimization to ensure reliability and security of your AI models. Discover best practices, workflow simplification, and techniques to monitor and maintain model integrity. From basic setup to more advanced uses, this article offers invaluable insights to enhance your model development and deployment process.

View post

How to address Machine Learning Bias in a pre-trained HuggingFace text classification model?

Machine learning models, despite their potential, often face issues like biases and performance inconsistencies. As these models find real-world applications, ensuring their robustness becomes paramount. This tutorial explores these challenges, using the Ecommerce Text Classification dataset as a case study. Through this, we highlight key measures and tools, such as Giskard, to boost model performance.

View post
Eliminating bias in Machine Learning predictions

Guide to Model Evaluation: Eliminating Bias in Machine Learning Predictions

Explore our tutorial on model fairness to detect hidden biases in machine learning models. Understand the flaws of traditional evaluation metrics with the help of the Giskard library. Our guide, packed with examples and a step-by-step process, shows you how to tackle data sampling bias and master feature engineering for fairness. Learn to create domain-specific tests and debug your ML models, ensuring they are fair and reliable.

View post
SHAP values - based on https://github.com/shap/shap

Opening the Black Box: Using SHAP values to explain and enhance Machine Learning models

SHAP stands for "SHapley Additive exPlanations", and is a unified approach that explains the output of any machine learning model; by delivering cohesive explanations it provides invaluable insight into how predictions are being made and opens up immense possibilities in terms of practical applications. In this tutorial we'll explore how to use SHAP values to explain and improve ML models, delving deeper into specific use cases as we go along.

View post
Testing Classification Models for Fraud Detection with Giskard

Testing Machine Learning Classification models for fraud detection

This article explains how Giskard open-source ML framework can be used for testing ML models and applied to fraud detection. It explores the components of Giskard: the Python library, its user-friendly interface, its installation process, and practical implementation for banknote authentication. The article provides step-by-step guide, code snippets, and leverages the banknote authentication dataset to develop an accurate ML model.

View post
Robot reading a newspaper generated by open-source generative AI model ControlNet and Stable Diffusion

How to evaluate and load a PyTorch model with Giskard?

This tutorial teaches you how to upload a PyTorch model (built from scratch or pre-trained) to Giskard, and identify potential errors and biases.

View post
Picture illustrating gender bias generated by DALL-E2

How to test the fairness of ML models? The 80% rule to measure the disparate impact

This article provides a step-by-step guide to detecting ethical bias in AI models, using a customer churn model as an example, using the LightGBM ML library. We show how to calculate the disparate impact metric with respect to gender and age, and demonstrate how to implement this metric as a fairness test within Giskard's open-source ML testing framework.

View post
Happy green robot generated by open-source generative AI model Stable Diffusion

How to deploy a robust HuggingFace model for sentiment analysis into production?

This tutorial teaches you how to build, test and deploy a Huggingface AI model for sentiment analysis while ensuring its robustness in production.

View post
Metamorphic testing

How to test ML models? #4 🎚 Metamorphic testing

Metamorphic testing are adapted to Machine Learning. This tutorial describes the theory, examples and code to implement it.

View post
Numerical data drift

How to test ML models? #3 📈 Numerical data drift

Testing the drift of numerical feature distribution is essential in AI. Here are the key metrics you can use to detect it.

View post
Cars drifting

How to test ML models #2 🧱 Categorical data drift

Testing drift of categorical feature distribution is essential in AI / ML, requiring specific metrics

View post
Zoom in on the problem

How to test ML models? #1 👉 Introduction

What you need to know before getting started with ML Testing in 3 points

View post
Stay updated with
the Giskard Newsletter