All Knowledge

The Giskard hub

RealPerformance, A Dataset of Language Model Business Compliance Issues

Giskard launches RealPerformance to address the gap between the focus on security and business compliance issues: the first systematic dataset of business performance failures in conversational AI, based on real-world testing across banks, insurers, and other industries.

View post
AI Safety Research - Phare Benchmark - Bias Evaluation - Self-Coherency

LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

Our Phare benchmark reveals that leading LLMs reproduce stereotypes in stories despite recognising bias when asked directly. Analysis of 17 models shows the generation vs discrimination gap.

View post

LLM Observability vs LLM Evaluation: Building Comprehensive Enterprise AI Testing Strategies

Enterprise AI teams often treat observability and evaluation as competing priorities, leading to gaps in either technical monitoring or quality assurance.

View post

Real-Time Guardrails vs Batch LLM Evaluations: A Comprehensive AI Testing Strategy

Enterprise AI teams need both immediate protection and deep quality insights but often treat guardrails and batch evaluations as competing priorities.

View post
Understanding Hallucination and Misinformation in LLMs

A Practical Guide to LLM Hallucinations and Misinformation Detection

Explore how false content is generated by AI and why it's critical to understand LLM vulnerabilities for safer, more ethical AI use.

View post
Illustration of AI vulnerabilities and risk mitigation in Large Language Models (LLMs) for secure and responsible deployment.

A Practical Guide on AI Security and LLM Vulnerabilities

Discover the key vulnerabilities in Large Language Models (LLMs) and learn how to mitigate AI risks with clear overviews and practical examples. Stay ahead in safe and responsible AI deployment.

View post
Phare LLM Benchmark - an analysis of hallucination in leading LLMs

Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs

We're sharing the first results from Phare, our multilingual benchmark for evaluating language models. The benchmark research reveals leading LLMs confidently produce factually inaccurate information. Our evaluation of top models from eight AI labs shows they generate authoritative-sounding responses containing completely fabricated details, particularly when handling misinformation.

View post
Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming

Secure AI Agents: Exhaustive testing with continuous LLM Red Teaming

Testing AI agents presents significant challenges as vulnerabilities continuously emerge, exposing organizations to reputational and financial risks when systems fail in production. Giskard's LLM Evaluation Hub addresses these challenges through adversarial LLM agents that automate exhaustive testing, annotation tools that integrate domain expertise, and continuous red teaming that adapts to evolving threats.

View post
Increasing trust in foundation language models through multi-lingual security, safety and robustness testing

Giskard announces Phare, a new open & multi-lingual LLM Benchmark

During the Paris AI Summit, Giskard launches Phare, a new open & independent LLM benchmark to evaluate key AI security dimensions including hallucination, factual accuracy, bias, and potential for harm across several languages, with Google DeepMind as research partner. This initiative is meant to provide open measurements to assess trustworthiness of Generative AI models in real applications.

View post
DeepSeek R1 analysis

DeepSeek R1: Complete analysis of capabilities and limitations

In this article, we provide a detailed analysis of DeepSeek R1, comparing its performance against leading AI models like GPT-4o and O1. Our testing reveals both impressive knowledge capabilities and significant concerns, particularly regarding the model's tendency to generate hallucinations. Through concrete examples, we examine how R1 handles politically sensitive topics.

View post
Giskard integrates with LiteLLM to simplify LLM agent testing

[Release notes] Giskard integrates with LiteLLM: Simplifying LLM agent testing across foundation models

Giskard's integration with LiteLLM enables developers to test their LLM agents across multiple foundation models. The integration enhances Giskard's core features - LLM Scan for vulnerability assessment and RAGET for RAG evaluation - by allowing them to work with any supported LLM provider: whether you're using major cloud providers like OpenAI and Anthropic, local deployments through Ollama, or open-source models like Mistral.

View post
EU's AI liability directives

AI Liability in the EU: Business guide to Product (PLD) and AI Liability Directives (AILD)

The EU is establishing an AI liability framework through two key regulations: the Product Liability Directive (PLD), taking effect in 2024, and the proposed AI Liability Directive (AILD). The PLD introduces strict liability for defective AI systems and software, while the AILD addresses negligent use, though its final form remains under debate. Learn in this article the key points of these regulations and how they will impact businesses.

View post
Giskard-vision: Evaluate Computer Vision tasks

Giskard Vision: Enhance Computer Vision models for image classification, object an landmark detection

Giskard Vision is a new module in our open-source library designed to assess and improve computer vision models. It offers automated detection of performance issues, biases, and ethical concerns in image classification, object detection, and landmark detection tasks. The article provides a step-by-step guide on how to integrate Giskard Vision into existing workflows, enabling data scientists to enhance the reliability and fairness of their computer vision systems.

View post
Giskard integrates with NVIDIA NeMo

Evaluating LLM applications: Giskard Integration with NVIDIA NeMo Guardrails

Giskard has integrated with NVIDIA NeMo Guardrails to enhance the safety and reliability of LLM-based applications. This integration allows developers to better detect vulnerabilities, automate rail generation, and streamline risk mitigation in LLM systems. By combining Giskard with NeMo Guardrails organizations can address critical challenges in LLM development, including hallucinations, prompt injection and jailbreaks.

View post
Council of Europe - AI Treaty [1]

Global AI Treaty: EU, UK, US, and Israel sign landmark AI regulation

The Council of Europe has signed the world's first AI treaty marking a significant step towards global AI governance. This Framework Convention on Artificial Intelligence aligns closely with the EU AI Act, adopting a risk-based approach to protect human rights and foster innovation. The treaty impacts businesses by establishing requirements for trustworthy AI, mandating transparency, and emphasizing risk management and compliance.

View post
EU AI Act published in the EU Official Journal

The EU AI Act published in the EU Official Journal: Next steps for AI Regulation

The EU AI Act, published on July 12, 2024, establishes the world's first comprehensive regulatory framework for AI technologies, with a gradual implementation timeline from 2024 to 2027. It adopts a risk-based approach, imposing varying requirements on AI systems based on their risk level.

View post
ArGiMi Consortium

Giskard leads GenAI Evaluation in France 2030's ArGiMi Consortium

The ArGiMi consortium, including Giskard, Artefact and Mistral AI, has won a France 2030 project to develop next-generation French LLMs for businesses. Giskard will lead efforts in AI safety, ensuring model quality, conformity, and security. The project will be open-source ensuring collaboration, and aiming to make AI more reliable, ethical, and accessible across industries.

View post
Giskard + Databricks integration

Partnership announcement: Bringing Giskard LLM evaluation to Databricks

Giskard has integrated with Databricks MLflow to enhance LLM testing and deployment. This collaboration allows AI teams to automatically identify vulnerabilities, generate domain-specific tests, and log comprehensive reports directly into MLflow. The integration aims to streamline the development of secure, reliable, and compliant LLM applications, addressing key risks like prompt injection, hallucinations, and unintended data disclosures.

View post
Differences between MLOps and LLMOps

LLMOps: MLOps for Large Language Models

This article explores LLMOps, detailing its challenges and best practices for managing Large Language Models (LLMs) in production. It compared LLMOps with traditional MLOps, covering hardware needs, performance metrics, and handling non-deterministic outputs. The guide outlines steps for deploying LLMs, including model selection, fine-tuning, and continuous monitoring, while emphasizing quality and security management.

View post
LLM jailbreaking

Defending LLMs against Jailbreaking: Definition, examples and prevention

Jailbreaking refers to maliciously manipulating Large Language Models (LLMs) to bypass their ethical constraints and produce unauthorized outputs. This emerging threat arises from combining the models' high adaptability with inherent vulnerabilities that attackers can exploit through techniques like prompt injection. Mitigating jailbreaking risks requires a holistic approach involving robust security measures, adversarial testing, red teaming, and ongoing vigilance to safeguard the integrity and reliability of AI systems.

View post
Data poisoning attacks

Data Poisoning attacks on Enterprise LLM applications: AI risks, detection, and prevention

Data poisoning is a real threat to enterprise AI systems like Large Language Models (LLMs), where malicious data tampering can skew outputs and decision-making processes unnoticed. This article explores the mechanics of data poisoning attacks, real-world examples across industries, and best practices to mitigate risks through red teaming, and automated evaluation tools.

View post
Giskard LLM scan multi-model

[Release notes] LLM app vulnerability scanner for Mistral, OpenAI, Ollama, and Custom Local LLMs

Releasing an upgraded version of Giskard's LLM scan for comprehensive vulnerability assessments of LLM applications. New features include more accurate detectors through optimized prompts and expanded multi-model compatibility supporting OpenAI, Mistral, Ollama, and custom local LLMs. This article also covers an initial setup guide for evaluating LLM apps.

View post
Why LLM evaluation is important

Guide to LLM evaluation and its critical impact for businesses

As businesses increasingly integrate LLMs into several applications, ensuring the reliability of AI systems is key. LLMs can generate biased, inaccurate, or even harmful outputs if not properly evaluated. This article explains the importance of LLM evaluation, and how to do it (methods and tools). It also present Giskard's comprehensive solutions for evaluating LLMs, combining automated testing, customizable test cases, and human-in-the-loop.

View post
Red Teaming LLM Applications course

New course with DeepLearningAI: Red Teaming LLM Applications

Our new course in collaboration with DeepLearningAI team provides training on red teaming techniques for Large Language Model (LLM) and chatbot applications. Through hands-on attacks using prompt injections, you'll learn how to identify vulnerabilities and security failures in LLM systems.

View post
Giskard's LLM Red Teaming

LLM Red Teaming: Detect safety & security breaches in your LLM apps

Introducing our LLM Red Teaming service, designed to enhance the safety and security of your LLM applications. Discover how our team of ML Researchers uses red teaming techniques to identify and address LLM vulnerabilities. Our new service focuses on mitigating risks like misinformation and data leaks by developing comprehensive threat models.

View post
Classification of AI systems under the EU AI Act

EU AI ACT: 8 Takeaways from the Council's Final Approval

The Council of the EU has recently voted unanimously on the final version of the European AI Act. It’s a significant step forward in its efforts to legislate the first AI law in the world. The Act establishes a regulatory framework for the safe use and development of AI, categorizing AI systems according to their associated risk. In the coming months, the text will enter the last stage of the legislative process, where the European Parliament will have a final vote on the AI Act.

View post
Giskard 2023 retrospective

Giskard's retrospective of 2023 and a glimpse into what's next for 2024!

2023 retrospective, covering people, company, customers, and product news, also offers a glimpse into what's next for 2024. Our team keeps growing, with new offices in Paris, new customers, and product features. Our GitHub repo has nearly reached 2500 stars, and we were Product of the Day on Product Hunt. All this and more in our 2023 review.

View post

EU AI Act: The EU Strikes a Historic Agreement to Regulate AI

The EU's AI Act establishes rules for AI use and development, focusing on ethical standards and safety. It categorizes AI systems, highlights high-risk uses, and sets compliance requirements. This legislation, a first in global AI governance, signals a shift towards responsible AI innovation in Europe.

View post
Biden’s Executive Order to Regulate AI

Biden's Executive Order: The Push to Regulate AI in the US

One year after the launch of ChatGPT, regulators worldwide are still figuring out how to regulate Generative AI. The EU is going through intense debates on how to close the so-called 'EU AI Act' after two years of legislative process. At the same time, only one month ago, the White House surprised everyone with a landmark Executive Order to regulate AI in the US. In this article, I delve into the Executive Order and advance some ideas on how it can impact the whole AI regulatory landscape.

View post
Giskard’s LLM Testing solution is launching on Product Hunt

Our LLM Testing solution is launching on Product Hunt 🚀

We have just launched Giskard v2, extending the testing capabilities of our library and Hub to Large Language Models. Support our launch on Product Hunt and explore our new integrations with Hugging Face, Weights & Biases, MLFlow, and Dagshub. A big thank you to our community for helping us reach over 1900 stars on GitHub.

View post

Towards AI Regulation: How Countries are Shaping the Future of Artificial Intelligence

In this article we will present the challenges and approaches to AI Regulation in major jurisdictions such as the European Union, the United States, China, Canada and the UK. Explore the growing impact of AI on society and how AI quality tools like Giskard ensure reliable models and compliance.

View post
AI Safety and Security: Insights from Giskard's CPO - Interview with Jean-Marie John-Mathews

AI Safety and Security: A Conversation with Giskard's Co-Founder and CPO

Giskard's Co-Founder and CPO, Jean-Marie John-Mathews was recently interviewed by Safety Detectives and he shared insights into the company's mission to advance AI Safety and Quality. In this interview, Jean-Marie explains the strategies, vulnerabilities, and ethical considerations at the forefront of AI technology, as Giskard bridges the gap between AI models and real-world applications.

View post
Giskard team at DEFCON31

AI Safety at DEFCON 31: Red Teaming for Large Language Models (LLMs)

DEFCON, one of the world's premier hacker conventions, this year saw a unique focus at the AI Village: red teaming of Large Language Models (LLMs). Instead of conventional hacking, participants were challenged to use words to uncover AI vulnerabilities. The Giskard team was fortunate to attend, witnessing firsthand the event's emphasis on understanding and addressing potential AI risks.

View post
OWASP Top 10 for LLM 2023

OWASP Top 10 for LLM 2023: Understanding the Risks of Large Language Models

In this post, we introduce OWASP's first version of the Top 10 for LLM, which identifies critical security risks in modern LLM systems. It covers vulnerabilities like Prompt Injection, Insecure Output Handling, Model Denial of Service, and more. Each vulnerability is explained with examples, prevention tips, attack scenarios, and references. The document serves as a valuable guide for developers and security practitioners to protect LLM-based applications and data from potential attacks.

View post
White House pledge targets AI regulation

White House pledge targets AI regulation with Top Tech companies

In a significant move towards AI regulation, President Biden convened a meeting with top tech companies, leading to a White House pledge that emphasizes AI safety and transparency. Companies like Google, Amazon, and OpenAI have committed to pre-release system testing, data transparency, and AI-generated content identification. As tech giants signal their intent, concerns remain regarding the specificity of their commitments.

View post
LLM Scan: Advanced LLM vulnerability detection

1,000 GitHub stars, 3M€, and new LLM scan feature  💫

We've reached an impressive milestone of 1,000 GitHub stars and received strategic funding of 3M€ from the French Public Investment Bank and the European Commission. With this funding, we plan to enhance their Giskard platform, aiding companies in meeting upcoming AI regulations and standards. Moreover, we've upgraded our LLM scan feature to detect even more hidden vulnerabilities.

View post
Clément Delangue representing Hugging Face at the US Congress!

The Open-Source AI Imperative: Key Takeaways from Hugging Face CEO's Testimony to the US Congress

Explore key insights from Clément Delangue's testimony to the US Congress on Open-Science and Open-Source AI. Understand the importance of Open-Source & Open-Science to democratize AI technology and promote ethical AI development that benefits all.

View post
Scan your AI model to find vulnerabilities

Giskard’s new beta is out! ⭐ Scan your model to detect hidden vulnerabilities

Giskard's new beta release enables to quickly scan your AI model and detect vulnerabilities directly in your notebook. The new beta also includes simple one-line installation, automated test suite generation and execution, improved user experience for collaboration on testing dashboards, and a ready-made test catalog.

View post
Demystifying the EU AI Act news

The EU AI Act: What can you expect from the upcoming European regulation of AI?

In light of the widespread and rapid adoption of ChatGPT and other Generative AI models, which have brought new risks, the EU Parliament has accelerated its agenda on AI. The vote that took place on May 11, 2023 represents a significant milestone in the path toward the adoption of the first comprehensive AI regulation.

View post
Giskard interview for BFM Business' FocusPME

Exclusive Interview: How to eliminate risks of AI incidents in production

During this exclusive interview for BFM Business, Alex Combessie, our CEO and co-founder, spoke about the potential risks of AI for companies and society. As new AI technologies like ChatGPT emerge, concerns about the dangers of untested models have increased. Alex stresses the importance of Responsible AI, which involves identifying ethical biases and preventing errors. He also discusses the future of EU regulations and their potential impact on businesses.

View post
SafeGPT - The safest way to use ChatGPT and other LLMs

🔥 The safest way to use ChatGPT... and other LLMs

With Giskard’s SafeGPT you can say goodbye to errors, biases & privacy issues in LLMs. Its features include an easy-to-use browser extension and a monitoring dashboard (for ChatGPT users), and a ready-made and extensible quality assurance platform for debugging any LLM (for LLM developers)

View post
Giskard's turtle slicing some veggies!

Giskard 1.4 is out! What's new in this version? ⭐

With Giskard’s new Slice feature, we introduce the possibility to identify business areas in which your AI models underperform. This will make it easier to debug performance biases or identify spurious correlations. We have also added an export/import feature to share your projects, as well as other minor improvements.

View post
Gartner Research

Giskard mentioned as a significant vendor in Gartner's Market Guide for AI Trust, Risk and Security Management

AI poses new trust, risk and security management requirements that conventional controls do not address. This Market Guide defines new capabilities that data and analytics leaders must have to ensure model reliability, trustworthiness and security, and presents representative vendors who implement these functions.

View post
Giskard at FOSDEM 2023

FOSDEM 2023: Presentation on CI/CD for ML and How to test ML models?

In this talk, we explain why testing ML models is an important and difficult problem. Then we explain, using concrete examples, how Giskard helps ML Engineers deploy their AI systems into production safely by (1) designing fairness & robustness tests and (2) integrating them in a CI/CD pipeline.

View post
Our first interview on BFM TV Tech & Co

Exclusive interview: our first television appearance on AI risks & security

This interview of Jean-Marie John-Mathews, co-founder of Giskard, discusses the ethical & security concerns of AI. While AI is not a new thing, recent developments like chatGPT bring a leap in performance that require rethinking how AI has been built. We discuss all the fear and fantasy about AI, how it can pose biases and create industrial incidents. Jean-Marie suggests that protection of AI resides in tests and safeguards to ensure responsible AI.

View post
Giskard's co-founders: Andrei Avtomonov (left), Jean-Marie John-Mathews (center), Alex Combessie (right)

Giskard closes its first financing round to expand Enterprise offering

The funding led by Elaia, with participation from Bessemer Venture Partners and notable angel investors, will accelerate the development of an enterprise-ready platform to help companies test, audit & ensure the quality of AI models.

View post
Python enjoying cups of Java coffee beans - generated by OpenAI DallE

Giskard is coming to your notebook: Python meets Java via gRPC tunnel

With Giskard’s new External ML Worker feature, we introduce a gRPC tunnel to reverse the client-server communication so that data scientists can re-use an existing Python code environment for model execution by Giskard.

View post
Dollar Planets Generated by OpenAI DALL·E

Why do Citibeats & Altaroad Test AI Models? The Business Value of Test-Driven Data Science

Why do great Data Scientists & ML Engineers love writing tests? Two customer case studies on improving model robustness and ensuring AI Ethics.

View post
Synthwave astronauts polishing the hull of a giant marine turtle in space - Generated by OpenAI DallE

Does User Experience Matter to ML Engineers? Giskard Latest Release

What are the preferences of ML Engineers in terms of UX? A summary of key learnings, and how we implemented them in Giskard's latest release.

View post
Sea Turtle

Why & how we decided to change Giskard's identity

We explain why Giskard changed its value proposition, and how we translated it to a new visual identity

View post
Happy ML Tester

Giskard's new feature: Automated Machine Learning Testing

The Open Beta of Giskard's AI Test feature: an automated way to test your ML models and ensure performance, robustness, and ethics

View post
A billion stars

Who cares about AI Quality? Launching our AI Innovator community

The Giskard team explains the undergoing shift toward AI Quality, and how we launched the first community for AI Quality Innovators

View post
Presentation bias

Where do biases in ML come from? #7 📚 Presentation

We explain presentation bias, a negative effect present in almost all ML systems with User Interfaces (UI)

View post
A shift

Where do biases in ML come from? #6 🐝 Emergent bias

Emergent biases result from the use of AI / ML across unanticipated contexts. It introduces risk when the context shifts.

View post
Happy new year 2022

Wishing y’all a happy & healthy 2022! 🎊

The Giskard team wishes you a happy 2022! Here is a summary of what we accomplished in 2021.

View post
Raised hands

Where do biases in ML come from? #5 🗼 Structural bias

Social, political, economic, and post-colonial asymmetries introduce risk to AI / ML development

View post
Orange picking

Where do biases in ML come from? #4 📊 Selection

Selection bias happens when your data is not representative of the situation to analyze, introducing risk to AI / ML systems

View post
Ruler to measure

Where do biases in ML come from? #3 📏 Measurement

Machine Learning systems are particularly sensitive to measurement bias. Calibrate your AI / ML models to avoid that risk.

View post
Variables crossing

Where do biases in ML come from? #2 ❌ Exclusion

What happens when your AI / ML model is missing important variables? The risks of endogenous and exogenous exclusion bias.

View post
Searching for bias in ML

Where do biases in ML come from? #1 👉 Introduction

Research Literature review: A Survey on Bias and Fairness in Machine Learning

View post
Trust in AI systems

8 reasons why you need Quality Testing for AI

Understand why Quality Assurance for AI is the need of the hour. Gain competitive advantage from your technological investments in ML systems.

View post
Research literature

What does research tell us about the future of AI Quality? 💡

We look into the latest research to understand what is the future of AI / ML Testing

View post
Quality Monitoring Dashboard

How did the idea of Giskard emerge? #8 👁‍🗨 Monitoring

Monitoring is just a tool: necessary but not sufficient. You need people committed to AI maintenance, processes & tools in case things break down.

View post
Frances Haugen testifying at the US Senate

How did the idea of Giskard emerge? #7 👮‍♀️ Regulation

Biases in AI / ML algorithms are avoidable. Regulation will push companies to invest in mitigation strategies.

View post
Giskard founders: Alex and Jean-Marie

How did the idea of Giskard emerge? #6 👬 A Founders' story

Find out more about Giskard founders story

View post
Ai incident database

How did the idea of Giskard emerge? #5 📉 Reducing risks

Technological innovation such as AI / ML comes with risks. Giskard aims to reduce it.

View post
Five star quality standards

How did the idea of Giskard emerge? #4 ✅ Standards

Giskard supports quality standards for AI / ML models. Now is the time to adopt them!

View post
Recommender System

How did the idea of Giskard emerge? #3 📰 AI in the media

AI used in recommender systems is posing a serious issue for the media industry and our society

View post
User interfaces - counting sheeps

How did the idea of Giskard emerge? #2 🐑 User Interfaces

It is difficult to create interfaces to AI models Even AIs made by tech giants have bugs. With Giskard AI, we want to make it easy to create interfaces for humans to inspect AI models. 🕵️ Do you think interfaces are valuable? If so, what kinds of interfaces do you like?

View post
Running tests

How did the idea of Giskard emerge? #1 🤓 The ML Test Score

The ML Test Score include verification tests among 4 categories: Features and Data, Model Development, Infrastructure and Monitoring Tests

View post
Stay updated with
the Giskard Newsletter