Resources

RealPerformance, A Dataset of Language Model Business Compliance Issues

Giskard launches RealPerformance to address the gap between the focus on security and business compliance issues: the first systematic dataset of business performance failures in conversational AI, based on real-world testing across banks, insurers, and other industries.

View post

RAG Benchmarking: Comparing RAGAS, BERTScore, and Giskard for AI Evaluation

Discover the best tools for benchmarking Retrieval-Augmented Generation (RAG) systems. Compare RAGAS, BERTScore, Levenshtein Distance, and Giskard with real-world examples and find the optimal evaluation approach for your AI applications.

View post

LLM Observability vs LLM Evaluation: Building Comprehensive Enterprise AI Testing Strategies

Enterprise AI teams often treat observability and evaluation as competing priorities, leading to gaps in either technical monitoring or quality assurance.

View post

Real-Time Guardrails vs Batch LLM Evaluations: A Comprehensive AI Testing Strategy

Enterprise AI teams need both immediate protection and deep quality insights but often treat guardrails and batch evaluations as competing priorities.

View post
Understanding Hallucination and Misinformation in LLMs

A Practical Guide to LLM Hallucinations and Misinformation Detection

Explore how false content is generated by AI and why it's critical to understand LLM vulnerabilities for safer, more ethical AI use.

View post
Illustration of AI vulnerabilities and risk mitigation in Large Language Models (LLMs) for secure and responsible deployment.

A Practical Guide on AI Security and LLM Vulnerabilities

Discover the key vulnerabilities in Large Language Models (LLMs) and learn how to mitigate AI risks with clear overviews and practical examples. Stay ahead in safe and responsible AI deployment.

View post
Phare LLM Benchmark - an analysis of hallucination in leading LLMs

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

We're sharing the first results from Phare, our multilingual benchmark for evaluating language models. The benchmark research reveals leading LLMs confidently produce factually inaccurate information. Our evaluation of top models from eight AI labs shows they generate authoritative-sounding responses containing completely fabricated details, particularly when handling misinformation.

View post
Testing LLM Agents through continuous Red Teaming

How to implement LLM as a Judge to test AI Agents? (Part 2)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this second part of our tutorial, we'll explore how to automate test execution and implement continuous red teaming for LLM agents. Learn to systematically evaluate agentic AI systems, interpret results, and maintain security through ongoing testing as your AI application evolves.

View post
Implementing LLM as a Judge to test AI agents

How to implement LLM as a Judge to test AI Agents? (Part 1)

Testing AI agents effectively requires automated systems that can evaluate responses across several scenarios. In this first part of our tutorial, we introduce a systematic approach using LLM as a judge to detect hallucinations and security vulnerabilities before deployment. Learn how to generate synthetic test data and implement business annotation processes for exhaustive AI agent testing.

View post
Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming

Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming

Testing AI agents presents significant challenges as vulnerabilities continuously emerge, exposing organizations to reputational and financial risks when systems fail in production. Giskard's LLM Evaluation Hub addresses these challenges through adversarial LLM agents that automate exhaustive testing, annotation tools that integrate domain expertise, and continuous red teaming that adapts to evolving threats.

View post
Increasing trust in foundation language models through multi-lingual security, safety and robustness testing

Giskard announces Phare, a new open & multi-lingual LLM Benchmark

During the Paris AI Summit, Giskard launches Phare, a new open & independent LLM benchmark to evaluate key AI security dimensions including hallucination, factual accuracy, bias, and potential for harm across several languages, with Google DeepMind as research partner. This initiative is meant to provide open measurements to assess trustworthiness of Generative AI models in real applications.

View post
DeepSeek R1 analysis

DeepSeek R1: Complete analysis of capabilities and limitations

In this article, we provide a detailed analysis of DeepSeek R1, comparing its performance against leading AI models like GPT-4o and O1. Our testing reveals both impressive knowledge capabilities and significant concerns, particularly regarding the model's tendency to generate hallucinations. Through concrete examples, we examine how R1 handles politically sensitive topics.

View post
Giskard integrates with LiteLLM to simplify LLM agent testing

[Release notes] Giskard integrates with LiteLLM: Simplifying LLM agent testing across foundation models

Giskard's integration with LiteLLM enables developers to test their LLM agents across multiple foundation models. The integration enhances Giskard's core features - LLM Scan for vulnerability assessment and RAGET for RAG evaluation - by allowing them to work with any supported LLM provider: whether you're using major cloud providers like OpenAI and Anthropic, local deployments through Ollama, or open-source models like Mistral.

View post
EU's AI liability directives

AI Liability in the EU: Business guide to Product (PLD) and AI Liability Directives (AILD)

The EU is establishing an AI liability framework through two key regulations: the Product Liability Directive (PLD), taking effect in 2024, and the proposed AI Liability Directive (AILD). The PLD introduces strict liability for defective AI systems and software, while the AILD addresses negligent use, though its final form remains under debate. Learn in this article the key points of these regulations and how they will impact businesses.

View post
Giskard-vision: Evaluate Computer Vision tasks

Giskard Vision: Enhance Computer Vision models for image classification, object an landmark detection

Giskard Vision is a new module in our open-source library designed to assess and improve computer vision models. It offers automated detection of performance issues, biases, and ethical concerns in image classification, object detection, and landmark detection tasks. The article provides a step-by-step guide on how to integrate Giskard Vision into existing workflows, enabling data scientists to enhance the reliability and fairness of their computer vision systems.

View post
Giskard integrates with NVIDIA NeMo

Evaluating LLM applications: Giskard Integration with NVIDIA NeMo Guardrails

Giskard has integrated with NVIDIA NeMo Guardrails to enhance the safety and reliability of LLM-based applications. This integration allows developers to better detect vulnerabilities, automate rail generation, and streamline risk mitigation in LLM systems. By combining Giskard with NeMo Guardrails organizations can address critical challenges in LLM development, including hallucinations, prompt injection and jailbreaks.

View post
Council of Europe - AI Treaty [1]

Global AI Treaty: EU, UK, US, and Israel sign landmark AI regulation

The Council of Europe has signed the world's first AI treaty marking a significant step towards global AI governance. This Framework Convention on Artificial Intelligence aligns closely with the EU AI Act, adopting a risk-based approach to protect human rights and foster innovation. The treaty impacts businesses by establishing requirements for trustworthy AI, mandating transparency, and emphasizing risk management and compliance.

View post
Facial Landmark Detection for L'Oréal

L'Oréal leverages Giskard for advanced Facial Landmark Detection

L'Oréal has partnered with Giskard to enhance its AI models for Facial Landmark Detection. The collaboration focuses on evaluating and comparing various AI models using metrics such as Normalized Mean Error, prediction time, and robustness against image perturbations. It aims to improve the accuracy and reliability of L'Oréal's online services, ensuring superior performance across diverse facial regions and head poses. Co-authors: Alexandre Bouchez (L'Oréal), and Mathieu Martial (Giskard).

View post
EU AI Act published in the EU Official Journal

The EU AI Act published in the EU Official Journal: Next steps for AI Regulation

The EU AI Act, published on July 12, 2024, establishes the world's first comprehensive regulatory framework for AI technologies, with a gradual implementation timeline from 2024 to 2027. It adopts a risk-based approach, imposing varying requirements on AI systems based on their risk level.

View post
Giskard + Databricks integration

Partnership announcement: Bringing Giskard LLM evaluation to Databricks

Giskard has integrated with Databricks MLflow to enhance LLM testing and deployment. This collaboration allows AI teams to automatically identify vulnerabilities, generate domain-specific tests, and log comprehensive reports directly into MLflow. The integration aims to streamline the development of secure, reliable, and compliant LLM applications, addressing key risks like prompt injection, hallucinations, and unintended data disclosures.

View post
Differences between MLOps and LLMOps

LLMOps: MLOps for Large Language Models

This article explores LLMOps, detailing its challenges and best practices for managing Large Language Models (LLMs) in production. It compared LLMOps with traditional MLOps, covering hardware needs, performance metrics, and handling non-deterministic outputs. The guide outlines steps for deploying LLMs, including model selection, fine-tuning, and continuous monitoring, while emphasizing quality and security management.

View post
LLM jailbreaking

Defending LLMs against Jailbreaking: Definition, examples and prevention

Jailbreaking refers to maliciously manipulating Large Language Models (LLMs) to bypass their ethical constraints and produce unauthorized outputs. This emerging threat arises from combining the models' high adaptability with inherent vulnerabilities that attackers can exploit through techniques like prompt injection. Mitigating jailbreaking risks requires a holistic approach involving robust security measures, adversarial testing, red teaming, and ongoing vigilance to safeguard the integrity and reliability of AI systems.

View post
Giskard and Grafana for data drift monitoring

Giskard + Grafana for Data Drift Monitoring

Learn how to monitor and visualize data drift using Giskard and Grafana in this guide. Perfect for generating intuitive visual representations, this tutorial takes you through the essential steps of setting up Grafana dashboards and integrating Giskard for effective data drift testing and visualization.

View post
Data poisoning attacks

Data Poisoning attacks on Enterprise LLM applications: AI risks, detection, and prevention

Data poisoning is a real threat to enterprise AI systems like Large Language Models (LLMs), where malicious data tampering can skew outputs and decision-making processes unnoticed. This article explores the mechanics of data poisoning attacks, real-world examples across industries, and best practices to mitigate risks through red teaming, and automated evaluation tools.

View post
Giskard LLM scan multi-model

[Release notes] LLM app vulnerability scanner for Mistral, OpenAI, Ollama, and Custom Local LLMs

Releasing an upgraded version of Giskard's LLM scan for comprehensive vulnerability assessments of LLM applications. New features include more accurate detectors through optimized prompts and expanded multi-model compatibility supporting OpenAI, Mistral, Ollama, and custom local LLMs. This article also covers an initial setup guide for evaluating LLM apps.

View post
Why LLM evaluation is important

Guide to LLM evaluation and its critical impact for businesses

As businesses increasingly integrate LLMs into several applications, ensuring the reliability of AI systems is key. LLMs can generate biased, inaccurate, or even harmful outputs if not properly evaluated. This article explains the importance of LLM evaluation, and how to do it (methods and tools). It also present Giskard's comprehensive solutions for evaluating LLMs, combining automated testing, customizable test cases, and human-in-the-loop.

View post
Red Teaming LLM Applications course

New course with DeepLearningAI: Red Teaming LLM Applications

Our new course in collaboration with DeepLearningAI team provides training on red teaming techniques for Large Language Model (LLM) and chatbot applications. Through hands-on attacks using prompt injections, you'll learn how to identify vulnerabilities and security failures in LLM systems.

View post
Giskard's LLM Red Teaming

LLM Red Teaming: Detect safety & security breaches in your LLM apps

Introducing our LLM Red Teaming service, designed to enhance the safety and security of your LLM applications. Discover how our team of ML Researchers uses red teaming techniques to identify and address LLM vulnerabilities. Our new service focuses on mitigating risks like misinformation and data leaks by developing comprehensive threat models.

View post
Data Drift Monitoring with Giskard

Data Drift Monitoring with Giskard

Learn how to effectively monitor and manage data drift in machine learning models to maintain accuracy and reliability. This article provides a concise overview of the types of data drift, detection techniques, and strategies for maintaining model performance amidst changing data. It provides data scientists with practical insights into setting up, monitoring, and adjusting models to address data drift, emphasising the importance of ongoing model evaluation and adaptation.

View post
Classification of AI systems under the EU AI Act

EU AI ACT: 8 Takeaways from the Council's Final Approval

The Council of the EU has recently voted unanimously on the final version of the European AI Act. It’s a significant step forward in its efforts to legislate the first AI law in the world. The Act establishes a regulatory framework for the safe use and development of AI, categorizing AI systems according to their associated risk. In the coming months, the text will enter the last stage of the legislative process, where the European Parliament will have a final vote on the AI Act.

View post
Giskard 2023 retrospective

Giskard's retrospective of 2023 and a glimpse into what's next for 2024!

2023 retrospective, covering people, company, customers, and product news, also offers a glimpse into what's next for 2024. Our team keeps growing, with new offices in Paris, new customers, and product features. Our GitHub repo has nearly reached 2500 stars, and we were Product of the Day on Product Hunt. All this and more in our 2023 review.

View post
Build and evaluate a Customer Service Chatbot. Image generated by DALL-E

How to find the best Open-Source LLM for your Customer Service Chatbot

Explore how to use open-source Large Language Models (LLMs) to build AI customer service chatbots. We guide you through creating chatbots with LangChain and HuggingFace libraries, and how to evaluate their performance and safety using Giskard's testing framework.

View post

EU AI Act: The EU Strikes a Historic Agreement to Regulate AI

The EU's AI Act establishes rules for AI use and development, focusing on ethical standards and safety. It categorizes AI systems, highlights high-risk uses, and sets compliance requirements. This legislation, a first in global AI governance, signals a shift towards responsible AI innovation in Europe.

View post
Biden’s Executive Order to Regulate AI

Biden's Executive Order: The Push to Regulate AI in the US

One year after the launch of ChatGPT, regulators worldwide are still figuring out how to regulate Generative AI. The EU is going through intense debates on how to close the so-called 'EU AI Act' after two years of legislative process. At the same time, only one month ago, the White House surprised everyone with a landmark Executive Order to regulate AI in the US. In this article, I delve into the Executive Order and advance some ideas on how it can impact the whole AI regulatory landscape.

View post
Giskard’s LLM Testing solution is launching on Product Hunt

Our LLM Testing solution is launching on Product Hunt 🚀

We have just launched Giskard v2, extending the testing capabilities of our library and Hub to Large Language Models. Support our launch on Product Hunt and explore our new integrations with Hugging Face, Weights & Biases, MLFlow, and Dagshub. A big thank you to our community for helping us reach over 1900 stars on GitHub.

View post

Mastering ML Model Evaluation with Giskard: From Validation to CI/CD Integration

Learn how to integrate vulnerability scanning, model validation, and CI/CD pipeline optimization to ensure reliability and security of your AI models. Discover best practices, workflow simplification, and techniques to monitor and maintain model integrity. From basic setup to more advanced uses, this article offers invaluable insights to enhance your model development and deployment process.

View post

How to address Machine Learning Bias in a pre-trained HuggingFace text classification model?

Machine learning models, despite their potential, often face issues like biases and performance inconsistencies. As these models find real-world applications, ensuring their robustness becomes paramount. This tutorial explores these challenges, using the Ecommerce Text Classification dataset as a case study. Through this, we highlight key measures and tools, such as Giskard, to boost model performance.

View post

Towards AI Regulation: How Countries are Shaping the Future of Artificial Intelligence

In this article we will present the challenges and approaches to AI Regulation in major jurisdictions such as the European Union, the United States, China, Canada and the UK. Explore the growing impact of AI on society and how AI quality tools like Giskard ensure reliable models and compliance.

View post
Eliminating bias in Machine Learning predictions

Guide to Model Evaluation: Eliminating Bias in Machine Learning Predictions

Explore our tutorial on model fairness to detect hidden biases in machine learning models. Understand the flaws of traditional evaluation metrics with the help of the Giskard library. Our guide, packed with examples and a step-by-step process, shows you how to tackle data sampling bias and master feature engineering for fairness. Learn to create domain-specific tests and debug your ML models, ensuring they are fair and reliable.

View post
AI Safety and Security: Insights from Giskard's CPO - Interview with Jean-Marie John-Mathews

AI Safety and Security: A Conversation with Giskard's Co-Founder and CPO

Giskard's Co-Founder and CPO, Jean-Marie John-Mathews was recently interviewed by Safety Detectives and he shared insights into the company's mission to advance AI Safety and Quality. In this interview, Jean-Marie explains the strategies, vulnerabilities, and ethical considerations at the forefront of AI technology, as Giskard bridges the gap between AI models and real-world applications.

View post
SHAP values - based on https://github.com/shap/shap

Opening the Black Box: Using SHAP values to explain and enhance Machine Learning models

SHAP stands for "SHapley Additive exPlanations", and is a unified approach that explains the output of any machine learning model; by delivering cohesive explanations it provides invaluable insight into how predictions are being made and opens up immense possibilities in terms of practical applications. In this tutorial we'll explore how to use SHAP values to explain and improve ML models, delving deeper into specific use cases as we go along.

View post
Giskard team at DEFCON31

AI Safety at DEFCON 31: Red Teaming for Large Language Models (LLMs)

DEFCON, one of the world's premier hacker conventions, this year saw a unique focus at the AI Village: red teaming of Large Language Models (LLMs). Instead of conventional hacking, participants were challenged to use words to uncover AI vulnerabilities. The Giskard team was fortunate to attend, witnessing firsthand the event's emphasis on understanding and addressing potential AI risks.

View post
OWASP Top 10 for LLM 2023

OWASP Top 10 for LLM 2023: Understanding the Risks of Large Language Models

In this post, we introduce OWASP's first version of the Top 10 for LLM, which identifies critical security risks in modern LLM systems. It covers vulnerabilities like Prompt Injection, Insecure Output Handling, Model Denial of Service, and more. Each vulnerability is explained with examples, prevention tips, attack scenarios, and references. The document serves as a valuable guide for developers and security practitioners to protect LLM-based applications and data from potential attacks.

View post
LLM Scan: Advanced LLM vulnerability detection

1,000 GitHub stars, 3M€, and new LLM scan feature  💫

We've reached an impressive milestone of 1,000 GitHub stars and received strategic funding of 3M€ from the French Public Investment Bank and the European Commission. With this funding, we plan to enhance their Giskard platform, aiding companies in meeting upcoming AI regulations and standards. Moreover, we've upgraded our LLM scan feature to detect even more hidden vulnerabilities.

View post
Testing Classification Models for Fraud Detection with Giskard

Testing Machine Learning Classification models for fraud detection

This article explains how Giskard open-source ML framework can be used for testing ML models and applied to fraud detection. It explores the components of Giskard: the Python library, its user-friendly interface, its installation process, and practical implementation for banknote authentication. The article provides step-by-step guide, code snippets, and leverages the banknote authentication dataset to develop an accurate ML model.

View post
Scan your AI model to find vulnerabilities

Giskard’s new beta is out! ⭐ Scan your model to detect hidden vulnerabilities

Giskard's new beta release enables to quickly scan your AI model and detect vulnerabilities directly in your notebook. The new beta also includes simple one-line installation, automated test suite generation and execution, improved user experience for collaboration on testing dashboards, and a ready-made test catalog.

View post
SafeGPT - The safest way to use ChatGPT and other LLMs

🔥 The safest way to use ChatGPT... and other LLMs

With Giskard’s SafeGPT you can say goodbye to errors, biases & privacy issues in LLMs. Its features include an easy-to-use browser extension and a monitoring dashboard (for ChatGPT users), and a ready-made and extensible quality assurance platform for debugging any LLM (for LLM developers)

View post
Giskard's turtle slicing some veggies!

Giskard 1.4 is out! What's new in this version? ⭐

With Giskard’s new Slice feature, we introduce the possibility to identify business areas in which your AI models underperform. This will make it easier to debug performance biases or identify spurious correlations. We have also added an export/import feature to share your projects, as well as other minor improvements.

View post
Robot reading a newspaper generated by open-source generative AI model ControlNet and Stable Diffusion

How to evaluate and load a PyTorch model with Giskard?

This tutorial teaches you how to upload a PyTorch model (built from scratch or pre-trained) to Giskard, and identify potential errors and biases.

View post
Giskard at FOSDEM 2023

FOSDEM 2023: Presentation on CI/CD for ML and How to test ML models?

In this talk, we explain why testing ML models is an important and difficult problem. Then we explain, using concrete examples, how Giskard helps ML Engineers deploy their AI systems into production safely by (1) designing fairness & robustness tests and (2) integrating them in a CI/CD pipeline.

View post
Python enjoying cups of Java coffee beans - generated by OpenAI DallE

Giskard is coming to your notebook: Python meets Java via gRPC tunnel

With Giskard’s new External ML Worker feature, we introduce a gRPC tunnel to reverse the client-server communication so that data scientists can re-use an existing Python code environment for model execution by Giskard.

View post
Happy green robot generated by open-source generative AI model Stable Diffusion

How to deploy a robust HuggingFace model for sentiment analysis into production?

This tutorial teaches you how to build, test and deploy a Huggingface AI model for sentiment analysis while ensuring its robustness in production.

View post
Dollar Planets Generated by OpenAI DALL·E

Why do Citibeats & Altaroad Test AI Models? The Business Value of Test-Driven Data Science

Why do great Data Scientists & ML Engineers love writing tests? Two customer case studies on improving model robustness and ensuring AI Ethics.

View post
Synthwave astronauts polishing the hull of a giant marine turtle in space - Generated by OpenAI DallE

Does User Experience Matter to ML Engineers? Giskard Latest Release

What are the preferences of ML Engineers in terms of UX? A summary of key learnings, and how we implemented them in Giskard's latest release.

View post
Sea Turtle

Why & how we decided to change Giskard's identity

We explain why Giskard changed its value proposition, and how we translated it to a new visual identity

View post
Happy ML Tester

Giskard's new feature: Automated Machine Learning Testing

The Open Beta of Giskard's AI Test feature: an automated way to test your ML models and ensure performance, robustness, and ethics

View post
A billion stars

Who cares about AI Quality? Launching our AI Innovator community

The Giskard team explains the undergoing shift toward AI Quality, and how we launched the first community for AI Quality Innovators

View post
Numerical data drift

How to test ML models? #3 📈 Numerical data drift

Testing the drift of numerical feature distribution is essential in AI. Here are the key metrics you can use to detect it.

View post
Open Ocean

Why & how we decided to make Giskard Open-Source

We explain why the Giskard team decided to go Open-Source, how we launched our first version, and what's next for our Community.

View post
Cars drifting

How to test ML models #2 🧱 Categorical data drift

Testing drift of categorical feature distribution is essential in AI / ML, requiring specific metrics

View post
Zoom in on the problem

How to test ML models? #1 👉 Introduction

What you need to know before getting started with ML Testing in 3 points

View post
Presentation bias

Where do biases in ML come from? #7 📚 Presentation

We explain presentation bias, a negative effect present in almost all ML systems with User Interfaces (UI)

View post
A shift

Where do biases in ML come from? #6 🐝 Emergent bias

Emergent biases result from the use of AI / ML across unanticipated contexts. It introduces risk when the context shifts.

View post
Happy new year 2022

Wishing y’all a happy & healthy 2022! 🎊

The Giskard team wishes you a happy 2022! Here is a summary of what we accomplished in 2021.

View post
Raised hands

Where do biases in ML come from? #5 🗼 Structural bias

Social, political, economic, and post-colonial asymmetries introduce risk to AI / ML development

View post
Orange picking

Where do biases in ML come from? #4 📊 Selection

Selection bias happens when your data is not representative of the situation to analyze, introducing risk to AI / ML systems

View post
Ruler to measure

Where do biases in ML come from? #3 📏 Measurement

Machine Learning systems are particularly sensitive to measurement bias. Calibrate your AI / ML models to avoid that risk.

View post
Variables crossing

Where do biases in ML come from? #2 ❌ Exclusion

What happens when your AI / ML model is missing important variables? The risks of endogenous and exogenous exclusion bias.

View post
Searching for bias in ML

Where do biases in ML come from? #1 👉 Introduction

Research Literature review: A Survey on Bias and Fairness in Machine Learning

View post
Trust in AI systems

8 reasons why you need Quality Testing for AI

Understand why Quality Assurance for AI is the need of the hour. Gain competitive advantage from your technological investments in ML systems.

View post
Research literature

What does research tell us about the future of AI Quality? 💡

We look into the latest research to understand what is the future of AI / ML Testing

View post
Quality Monitoring Dashboard

How did the idea of Giskard emerge? #8 👁‍🗨 Monitoring

Monitoring is just a tool: necessary but not sufficient. You need people committed to AI maintenance, processes & tools in case things break down.

View post
Frances Haugen testifying at the US Senate

How did the idea of Giskard emerge? #7 👮‍♀️ Regulation

Biases in AI / ML algorithms are avoidable. Regulation will push companies to invest in mitigation strategies.

View post
Giskard founders: Alex and Jean-Marie

How did the idea of Giskard emerge? #6 👬 A Founders' story

Find out more about Giskard founders story

View post
Ai incident database

How did the idea of Giskard emerge? #5 📉 Reducing risks

Technological innovation such as AI / ML comes with risks. Giskard aims to reduce it.

View post
Five star quality standards

How did the idea of Giskard emerge? #4 ✅ Standards

Giskard supports quality standards for AI / ML models. Now is the time to adopt them!

View post
Recommender System

How did the idea of Giskard emerge? #3 📰 AI in the media

AI used in recommender systems is posing a serious issue for the media industry and our society

View post
User interfaces - counting sheeps

How did the idea of Giskard emerge? #2 🐑 User Interfaces

It is difficult to create interfaces to AI models Even AIs made by tech giants have bugs. With Giskard AI, we want to make it easy to create interfaces for humans to inspect AI models. 🕵️ Do you think interfaces are valuable? If so, what kinds of interfaces do you like?

View post
Running tests

How did the idea of Giskard emerge? #1 🤓 The ML Test Score

The ML Test Score include verification tests among 4 categories: Features and Data, Model Development, Infrastructure and Monitoring Tests

View post
Giskard x L'Oréal

L'Oréal evaluates their AI vision models with Giskard to improve customer experience

L'Oréal partnered with Giskard to enhance their facial landmark detection models, crucial for applications like face reconstruction and emotion recognition. Giskard's AI testing platform enabled L'Oréal to evaluate multiple models under diverse conditions, ensuring reliable and inclusive predictions across different user demographics. This collaboration improved the accuracy and robustness of L'Oréal's digital services, while proactively addressing potential biases.

View post
Regulating LLMs: What the EU AI Act Means for Providers of Generative AI Systems white paper

Regulating LLMs: What the EU AI Act Means for Providers of Generative AI Systems

As businesses rapidly adopt Generative AI models like LLMs and foundation models, the EU AI Act introduces a comprehensive regulatory framework to ensure their safe and responsible use. Understanding and complying with these new rules is crucial for organizations deploying AI applications.

View post
Stay updated with
the Giskard Newsletter