All Knowledge

The Giskard hub

RAG Benchmarking: Comparing RAGAS, BERTScore, and Giskard for AI Evaluation

Discover the best tools for benchmarking Retrieval-Augmented Generation (RAG) systems. Compare RAGAS, BERTScore, Levenshtein Distance, and Giskard with real-world examples and find the optimal evaluation approach for your AI applications.

Christianto Kurniawan (guest)

View post

Testing LLM Agents through continuous Red Teaming

Tutorials

How to implement LLM as a Judge to test AI Agents? (Part 2)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this second part of our tutorial, we'll explore how to automate test execution and implement continuous red teaming for LLM agents. Learn to systematically evaluate agentic AI systems, interpret results, and maintain security through ongoing testing as your AI application evolves.

Jean-Marie John-Mathews, Ph.D.

View post

Implementing LLM as a Judge to test AI agents

Tutorials

How to implement LLM as a Judge to test AI Agents? (Part 1)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this first part of our tutorial, we introduce a systematic approach using LLM as a judge to detect hallucinations and security vulnerabilities before deployment. Learn how to generate synthetic test data and implement business annotation processes for exhaustive AI agent testing.

Jean-Marie John-Mathews, Ph.D.

View post

Tutorials

L'Oréal leverages Giskard for advanced Facial Landmark Detection

L'Oréal has partnered with Giskard to enhance its AI models for Facial Landmark Detection. The collaboration focuses on evaluating and comparing various AI models using metrics such as Normalized Mean Error, prediction time, and robustness against image perturbations. It aims to improve the accuracy and reliability of L'Oréal's online services, ensuring superior performance across diverse facial regions and head poses. Co-authors: Alexandre Bouchez (L'Oréal), and Mathieu Martial (Giskard).

Rabah Abdul Khalek

View post

Giskard and Grafana for data drift monitoring

Tutorials

Giskard + Grafana for Data Drift Monitoring

Learn how to monitor and visualize data drift using Giskard and Grafana in this guide. Perfect for generating intuitive visual representations, this tutorial takes you through the essential steps of setting up Grafana dashboards and integrating Giskard for effective data drift testing and visualization.

Sagar Thacker

View post

Tutorials

Data Drift Monitoring with Giskard

Learn how to effectively monitor and manage data drift in machine learning models to maintain accuracy and reliability. This article provides a concise overview of the types of data drift, detection techniques, and strategies for maintaining model performance amidst changing data. It provides data scientists with practical insights into setting up, monitoring, and adjusting models to address data drift, emphasising the importance of ongoing model evaluation and adaptation.

Sagar Thacker

View post

Build and evaluate a Customer Service Chatbot. Image generated by DALL-E

Tutorials

How to find the best Open-Source LLM for your Customer Service Chatbot

Explore how to use open-source Large Language Models (LLMs) to build AI customer service chatbots. We guide you through creating chatbots with LangChain and HuggingFace libraries, and how to evaluate their performance and safety using Giskard's testing framework.

Ashna Ahmad

View post

Tutorials

Mastering ML Model Evaluation with Giskard: From Validation to CI/CD Integration

Learn how to integrate vulnerability scanning, model validation, and CI/CD pipeline optimization to ensure reliability and security of your AI models. Discover best practices, workflow simplification, and techniques to monitor and maintain model integrity. From basic setup to more advanced uses, this article offers invaluable insights to enhance your model development and deployment process.

Sagar Thacker

View post

Tutorials

How to address Machine Learning Bias in a pre-trained HuggingFace text classification model?

Machine learning models, despite their potential, often face issues like biases and performance inconsistencies. As these models find real-world applications, ensuring their robustness becomes paramount. This tutorial explores these challenges, using the Ecommerce Text Classification dataset as a case study. Through this, we highlight key measures and tools, such as Giskard, to boost model performance.

Mostafa Ibrahim

View post

Tutorials

Guide to Model Evaluation: Eliminating Bias in Machine Learning Predictions

Explore our tutorial on model fairness to detect hidden biases in machine learning models. Understand the flaws of traditional evaluation metrics with the help of the Giskard library. Our guide, packed with examples and a step-by-step process, shows you how to tackle data sampling bias and master feature engineering for fairness. Learn to create domain-specific tests and debug your ML models, ensuring they are fair and reliable.

Josiah Adesola

View post

SHAP values - based on https://github.com/shap/shap

Tutorials

Opening the Black Box: Using SHAP values to explain and enhance Machine Learning models

SHAP stands for "SHapley Additive exPlanations", and is a unified approach that explains the output of any machine learning model; by delivering cohesive explanations it provides invaluable insight into how predictions are being made and opens up immense possibilities in terms of practical applications. In this tutorial we'll explore how to use SHAP values to explain and improve ML models, delving deeper into specific use cases as we go along.

Mykyta Alekseiev

View post

Testing Classification Models for Fraud Detection with Giskard

Tutorials

Testing Machine Learning Classification models for fraud detection

This article explains how Giskard open-source ML framework can be used for testing ML models and applied to fraud detection. It explores the components of Giskard: the Python library, its user-friendly interface, its installation process, and practical implementation for banknote authentication. The article provides step-by-step guide, code snippets, and leverages the banknote authentication dataset to develop an accurate ML model.

Happiness Omale

View post

Robot reading a newspaper generated by open-source generative AI model ControlNet and Stable Diffusion

Tutorials

How to evaluate and load a PyTorch model with Giskard?

This tutorial teaches you how to upload a PyTorch model (built from scratch or pre-trained) to Giskard, and identify potential errors and biases.

Favour Kelvin

View post

Picture illustrating gender bias generated by DALL-E2

Tutorials

How to test the fairness of ML models? The 80% rule to measure the disparate impact

This article provides a step-by-step guide to detecting ethical bias in AI models, using a customer churn model as an example, using the LightGBM ML library. We show how to calculate the disparate impact metric with respect to gender and age, and demonstrate how to implement this metric as a fairness test within Giskard's open-source ML testing framework.

Rabah Abdul Khalek

View post