G

David Berenstein

News

RealPerformance, A Dataset of Language Model Business Compliance Issues

Giskard launches RealPerformance to address the gap between the focus on security and business compliance issues: the first systematic dataset of business performance failures in conversational AI, based on real-world testing across banks, insurers, and other industries.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
AI Safety Research - Phare Benchmark - Bias Evaluation - Self-Coherency
News

LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

Our Phare benchmark reveals that leading LLMs reproduce stereotypes in stories despite recognising bias when asked directly. Analysis of 17 models shows the generation vs discrimination gap.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

LLM Observability vs LLM Evaluation: Building Comprehensive Enterprise AI Testing Strategies

Enterprise AI teams often treat observability and evaluation as competing priorities, leading to gaps in either technical monitoring or quality assurance.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

Real-Time Guardrails vs Batch LLM Evaluations: A Comprehensive AI Testing Strategy

Enterprise AI teams need both immediate protection and deep quality insights but often treat guardrails and batch evaluations as competing priorities.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Understanding Hallucination and Misinformation in LLMs
Blog

A Practical Guide to LLM Hallucinations and Misinformation Detection

Explore how false content is generated by AI and why it's critical to understand LLM vulnerabilities for safer, more ethical AI use.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Illustration of AI vulnerabilities and risk mitigation in Large Language Models (LLMs) for secure and responsible deployment.
Blog

A Practical Guide on AI Security and LLM Vulnerabilities

Discover the key vulnerabilities in Large Language Models (LLMs) and learn how to mitigate AI risks with clear overviews and practical examples. Stay ahead in safe and responsible AI deployment.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post