AI hallucinations and the AI failure in a French Court

On December 18, 2025, a French court formally noted that a claimant's legal arguments contained "untraceable or erroneous" precedents. This AI incident is one of the documented cases of AI hallucinations in legal filings tracked worldwide.

In this article we cover a specific AI failure that occurred in the tribunal judiciaire of Périgueux, analyzing the mechanism of the failure, its consequences, and how teams can operationalize prevention of hallucinations in AI.

How an AI law assistant failed in the French Tribunal

The incident in question occurred during a legal proceeding where a claimant submitted written arguments containing serious irregularities.

A claimant before the tribunal judiciaire of Périgueux submitted written arguments that referenced judicial decisions the court could not locate, or whose dates and subjects did not match the citations provided.

The court's decision, dated 18 December 2025, noted that some of the cited precedents were "introuvables ou erronées" (untraceable or erroneous). Consequently, the court invited the claimant and their lawyer, even though they won on the merits, to verify that their sources were not hallucinated by AI from search engines or generative tools.

This incident was an example of AI hallucination, a failure mode where a language model asserts false information. Such hallucinations are intrinsic to generative models and can only be effectively detected or prevented by validating outputs against external sources.

The failure was not malicious in nature, originating instead from a design flaw or accidental misuse rather than an adversarial attack. Crucially, the system lacked the necessary safeguards to detect these errors, allowing the fabricated precedents to reach the courtroom unchecked.

Risks of an unchecked AI law assistants

While the claimant in this specific AI incident was successful on the merits, the technical and reputational fallout highlights severe risks for any AI law assistant. The court explicitly called out the use of hallucinated jurisprudence. This public acknowledgement undermines the credibility of the claimant and their lawyer, highlighting significant deficiencies in their due diligence.

If these AI fails are not addressed, they create risks that scale far beyond a single courtroom:

Contamination of legal records: There is a risk of a systematic spread of false legal precedents in filings, briefs, and internal memos. This can mislead judges, clients, or colleagues if not systematically verified.
Erosion of trust: Repeated failures lead to the erosion of trust in AI-assisted legal tools. This potentially leads to overreaction or, worse, silent reliance on flawed tools without proper oversight, increasing the risk of miscarriages of justice or ethical breaches.
Global scale: This decision is one of hundreds of documented cases worldwide where courts or tribunals had to deal with AI hallucinations in legal filings.

This signals that any general-purpose or poorly grounded AI law assistant, especially when used directly by litigants or lawyers, can inject fabricated law into formal proceedings unless strong verification workflows and domain-specific safeguards are in place.

Preventing AI hallucinations

To prevent these failures, relying on prompt engineering or manual review is insufficient. Before deployment in legal workflows, the AI should undergo targeted hallucination and reliability testing.

Here is how you can operationalize this testing methodology using the Giskard Hub:

1. Automated red teaming with the LLM vulnerability scanner

The first line of defense is the Giskard Vulnerability Scanner, which automates the red-teaming process by launching dynamic attacks against your model.

The scanner deploys specialized **probes (**structured adversarial tests) that attempt to force the model into failure modes. It specifically targets the Hallucination & Misinformation category (OWASP LLM 09), testing for both factuality (alignment with world knowledge) and faithfulness (alignment with provided source documents).

This automated step detects intrinsic weaknesses providing a baseline security grade (A-D) and a detailed report of where the model invents facts.

2. Ground-truth checks

Once the automated scan is complete, you must validate the specific logic of your application using Ground-truth checks.

These checks automatically validate each generated reference (case name + court + date + number) against a ground-truth index or knowledge base. Any citation that does not strictly match the verified index is flagged as a hallucination, preventing "invented" case law from reaching the final output.

3. Safety tests on critical tasks

Finally, convert the findings from your vulnerability scan into a persistent regression testing suite.

You can create test suites where the model is instructed to perform high-risk tasks, such as "draft a legal argument for X with case law". Every output is automatically scanned for unverifiable citations or "sycophancy" - the tendency of the model to agree with a user's false premise (e.g., inventing a case because the user asked for it).

Conclusion

As this ruling in Périgueux demonstrates, AI hallucinations in the legal sector can have a huge impact in professional credibility and judicial integrity. Organizations must move beyond passive monitoring to active vulnerability scanning to ensure their AI assistants remain an asset rather than a liability. By implementing robust testing frameworks like Giskard, legal tech leaders can build the necessary trust to deploy Generative AI safely at scale.

Ready to test your AI agents before they fail in production? Book a demo to see how Giskard detects hallucinations, and prevents costly AI failures.

AI hallucinations and the AI failure in a French Court

How an AI law assistant failed in the French Tribunal

Risks of an unchecked AI law assistants

Preventing AI hallucinations

1. Automated red teaming with the LLM vulnerability scanner

2. Ground-truth checks

3. Safety tests on critical tasks

Conclusion

You will also like

When AI financial advice goes wrong: ChatGPT, Copilot, and Gemini failed UK consumers

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Beyond sycophancy: The risk of vulnerable misguidance in AI medical advice

Get AI security insights in your inbox

AI hallucinations and the AI failure in a French Court

How an AI law assistant failed in the French Tribunal

Risks of an unchecked AI law assistants

Preventing AI hallucinations

1. Automated red teaming with the LLM vulnerability scanner

2. Ground-truth checks

3. Safety tests on critical tasks

Conclusion

You will also like

When AI financial advice goes wrong: ChatGPT, Copilot, and Gemini failed UK consumers

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Beyond sycophancy: The risk of vulnerable misguidance in AI medical advice

Get AI security insights in your inbox

Unlock Full Giskard Hub Demo: Test Your LLM Agents Now

Unlock Full Giskard Hub Demo:  
Test Your LLM Agents Now