David Berenstein

Blog

Risk assessment for LLMs and AI agents: OWASP, MITRE Atlas, and NIST AI RMF explained

There are three major tools for assessing risks associated with LLMs and AI Agents: OWASP, MITRE Attack and NIST AI RMF. Each of them has its own approach to risk and security, while examining it from different angles with varying levels of granularity and organisational scope. This blog will help you understand them.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

Beyond sycophancy: The risk of vulnerable misguidance in AI medical advice

Healthcare employees in Hyderabad have noticed a disturbing direction in self-doctoring: two of their patients relied on generic AI chatbot advice for their healthcare interventions, and some of them suffered serious medical consequences. Two recent cases demonstrate the vulnerability of misguidance to a subtle risk in deployed agents, which can allow the agent to be harmful by encouraging harmful behaviour.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

OWASP Top 10 for LLM 2025: Understanding the Risks of Large Language Models

The landscape of large language model security has evolved significantly since the release of OWASP’s Top 10 for LLM Applications in 2023, which we covered in our blog at the time. The 2025 edition represents a significant update of our understanding of how Gen AI systems are being deployed in production environments. The update does not come as a surprise, as companies like MITRE also continuously update their risk framework, Atlas. The lessons from enterprise deployments, and direct feedback from a global community of developers, security professionals, and data scientists working in AI security.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Anthropic claims Claude Code was used for the first Autonomous AI cyber espionage campaign
Blog

Anthropic claims Claude Code was used for the first Autonomous AI cyber espionage campaign

Anthropic has reported that Claude Code was used to orchestrate a cyber espionage campaign, with the AI independently executing 80–90% of the tactical operations. In this article, we analyze the mechanics of this attack, and explain how organizations can leverage continuous red teaming to defend against these threats.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
LLM security: single, multi-turn & dynamic agentic attacks in AI Red Teaming
Blog

Understanding single-turn, multi-turn, and dynamic agentic attacks in AI red teaming

AI red teaming has evolved from simple prompt injection into three distinct attack categories: single-turn attacks that test immediate defenses, multi-turn attacks that build context across conversations, and dynamic agentic attacks that autonomously adapt strategies in real-time. This article breaks down all three attack categories, and explains how to implement red teaming to protect production AI systems.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
OpenAI Atlas browser security risks | LLM vulnerability analysis
Blog

Are AI browsers safe? A security and vulnerability analysis of OpenAI Atlas

OpenAI's Atlas browser is powered by ChatGPT, but its design choices expose unknowing users to numerous risks. They were drawn in by the wonderful marketing promise of fast, helpful, and reliable AI, while articles about vulnerability exploitation continue to flood the news, just days after the beta release.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

RealPerformance, A Dataset of Language Model Business Compliance Issues

Giskard launches RealPerformance to address the gap between the focus on security and business compliance issues: the first systematic dataset of business performance failures in conversational AI, based on real-world testing across banks, insurers, and other industries.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
AI Safety Research - Phare Benchmark - Bias Evaluation - Self-Coherency
Blog

LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

Our Phare benchmark reveals that leading LLMs reproduce stereotypes in stories despite recognising bias when asked directly. Analysis of 17 models shows the generation vs discrimination gap.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

LLM Observability vs LLM Evaluation: Building Comprehensive Enterprise AI Testing Strategies

Enterprise AI teams often treat observability and evaluation as competing priorities, leading to gaps in either technical monitoring or quality assurance.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Blog

Real-Time Guardrails vs Batch LLM Evaluations: A Comprehensive AI Testing Strategy

Enterprise AI teams need both immediate protection and deep quality insights but often treat guardrails and batch evaluations as competing priorities.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Understanding Hallucination and Misinformation in LLMs
Blog

A Practical Guide to LLM Hallucinations and Misinformation Detection

Explore how false content is generated by AI and why it's critical to understand LLM vulnerabilities for safer, more ethical AI use.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post
Illustration of AI vulnerabilities and risk mitigation in Large Language Models (LLMs) for secure and responsible deployment.
Blog

A Practical Guide on AI Security and LLM Vulnerabilities

Discover the key vulnerabilities in Large Language Models (LLMs) and learn how to mitigate AI risks with clear overviews and practical examples. Stay ahead in safe and responsible AI deployment.

David Berenstein ex Hugging Face ex Argilla
David Berenstein
View post