February 4, 2026
8 min read
Weixuan Xiao

OpenClaw security vulnerabilities include data leakage and prompt injection risks

This article explores the critical security failures of the OpenClaw agentic AI, which allowed sensitive data to leak across user sessions and IM channels. It examines how architectural weaknesses in the Control UI and session management created direct paths for prompt injection and unauthorized tool use. Finally, it outlines the essential hardening steps and systematic red-teaming strategies required to transform a vulnerable "fun bot" into a secure enterprise assistant.
OpenClaw security issues include data leakage & prompt injection

OpenClaw (formerly known as Clawbot or Moltbot) has rapidly gained popularity as a powerful open-source agentic AI. It empowers users to interact with a personal assistant via Instant Messaging (IM) apps like Discord, Telegram, and WhatsApp, granting the agent near-total control over a host machine to execute complex tasks.

While its self-autonomous capabilities demonstrate how quickly AI agents can become indispensable with the right skills, this high level of permission introduces significant security risks. This article unpacks the OpenClaw incident, explains why these AI failures were inevitable under default configurations, and provides a roadmap for preventing similar AI data breaches through systematic testing and monitoring.

What happened with OpenClaw?

In January 2026, researchers at Giskard exploited a deployment of OpenClaw, revealing multiple OpenClaw vulnerabilities. The investigation confirmed that once an AI agent is exposed to public chat apps and equipped with powerful tools, misconfigurations become a direct path to data exfiltration and account takeover.

The OpenClaw architecture

Let’s start with the basic architecture and components of an OpenClaw instance. The core of OpenClaw is an AI Gateway that connects your IM appssuch as WhatsApp, Telegram, Slack, and DiscordLLM-powered agents capable of taking real actions on your systems. It is composed of several components that work together to route messages, maintain context, and execute tools safely:

  • Nodes are remote execution hosts (typically macOS machines) that can be paired with the gateway so the agent can run commands, send notifications, or control the browser on the node (Mac with the widest supports as it’s the first-level civil in OpenClaw).
  • Skills are modular capability packs stored as SKILL.md files that teach the agent how to perform higher‑level tasks by composing tools, external binaries, or API calls; skills can be bundled with OpenClaw, installed from ClawHub, or written by users and placed in the workspace.
  • There are a lot of out‑of‑the‑box tools, which are the low‑level primitives the agent invokes to actually do work: there are filesystem tools (read, write, edit, apply_patch), runtime tools (exec, process), messaging tools (to send messages on WhatsApp, Telegram, Slack, etc.), session‑management tools (sessions_list, sessions_history, sessions_send), browser‑automation tools (browser), and gateway/admin tools (nodes, cron, gateway, canvas, image).
OpenClaw architecture

All of the configurations and the sessions can be viewed and modified in a Control UI, where it is exposed by OpenClaw gateway, on a specific port (18789 by default).

Security vulnerabilities in the OpenClaw AI assistant

The failure did not stem from the model "hallucinating" or misclassifying data. Instead, it emerged from how the OpenClaw gateway handled sessions, tools, and permissions across direct messages, groups, and the Control UI.

Key weaknesses included:

  • Exposed control UI: Access tokens appeared in query parameters, making them easy to leak via logs, browser history, or non-HTTPS traffic.
  • Shared global context: Direct messages were often configured to share a single context, meaning secrets loaded for one user became visible to others.
  • Lack of sandboxing: Group chats ran powerful tools without proper isolation, allowing the bot to read environment variables, API keys, and configurations, and even modify its own routing to join other groups.
  • Injection paths: External content (such as emails, scraped web pages, or "skills") provided paths for adversarial prompts to drive tool calls and exfiltrate data.

The result: a single malicious email, chat message, or web page could trick the assistant into leaking credentials, internal files, or cross-session conversation histories in a matter of minutes.

OpenClaw Control UI: A traditional attack surface

The Control UI is OpenClaw's central management surface and a traditional cyber attack vector. It allows operators to configure agents, channels, tools, and permissions, which means anyone with Control UI access can widen the gateway's attack surface or exfiltrate secrets.

  • Token leakage: By default, access tokens often appear in query parameters, making them easy to harvest from browser history or server logs.
  • Insecure traffic: When exposed over HTTP without device identity checks, these tokens are highly vulnerable.
  • Metadata exposure: Even over HTTPS, tokens can appear in TLS metadata if Encrypted SNI is not enabled.

OpenClaw's security docs recommend only to use HTTP with Tailscale Serve (which keeps the UI on loopback while Tailscale handles access) or enforcing password-based authentication with short-lived pairing codes rather than static tokens in URLs. The risk is amplified because misconfigurations visible in the Control UI (such as overly broad tool allowlists, or disabled device auth) directly enable the session-isolation and prompt-injection failures that define the OpenClaw incident.

Data leakage in direct messages and group chats

Since OpenClaw is primarily designed to serve as a personal AI assistant, direct messages (DMs) are the most intuitive way for a single person to interact with the bot. The agent is capable of receiving messages from the same individual across multiple IM platforms, such as Discord, Telegram, and WhatsApp.

DMs are intended as one-to-one chats where the agent typically shares a persistent "main" session. This allows the agent to remember your preferences and conversation history regardless of which device or channel you use.

However, OpenClaw’s default design for continuity creates a significant security risk: the convenience of a shared "main" session becomes dangerous when multiple different users send direct messages to the same bot. Because the bot recognizes all incoming DMs as belonging to a single entity, it fails to maintain proper boundaries between different people.

The risk of DM scoping

The issue lies in how OpenClaw security handles session.dmScope:

  • main (default): All DMs share one long‑lived “main” session. Great for a single user; catastrophic when several people can DM the bot.
  • per-peer: Isolates DMs by sender id across channels so each person gets one private session, even if they talk via multiple apps.
  • per-channel-peer: Isolates by (channel, sender), recommended for multi‑user inboxes where several people DM the same bot account.
  • per-account-channel-peer: Isolates by (account, channel, sender) when you run multiple bot accounts on the same platform.

In practice, the OpenClaw failure demonstrated that simply having these options is not enough.

In the affected deployment, DMs used the default main scope. This meant:

  • Environment variables and API keys loaded into the “private” DM session were available to anyone who could message the bot.
  • Files or notes stored in the shared workspace for one DM could be retrieved from another, despite the illusion of “one‑to‑one” chat. For example:

in session A from Telegram, a file is generated and saved:

and in an isolated session B from Discord, the file content can be read and displayed:

Even when teams tried “isolated” scopes like peer, unsafe workspace configuration and broad tool access meant local workspace files could still leak to any user mapped into that session.

Group messages in OpenClaw: where things get truly dangerous

Channels (or groups) are multi‑user rooms like Slack channels, Telegram groups, or Discord servers, where the bot can be mentioned or invited and will respond according to access policies and mention rules. Each channel or group gets its own isolated session, which holds the conversation history, context window, and any short‑term state for that room.

If DMs were a continuity‑vs‑privacy trade‑off, group chats were the perfect storm. OpenClaw uses non‑main session keys for groups, which keeps them separate from the main DM context but by default, it does not limit what tools can do inside those groups.

Without a Docker-based isolation boundary, tools invoked in a group can:

  • Reading env and API keys: Tools invoked from a group could access environment variables, configuration files or local filesystems, exposing credentials and internal secrets to anyone in the room.
  • Reconfiguring the agent: Mis‑scoped tools could alter routing or configuration so the bot joined additional groups or expanded its access beyond what admins intended.
  • Cross‑workspace leakage: Workspaces were not strictly segregated per group; a file saved in one “isolated” session could be surfaced in another.

OpenClaw’s own security docs acknowledge there is no “perfectly secure” setup and recommend sandboxing non‑main sessions with per‑session containers, no workspace access by default and strict tool allowlists. In the vulnerable deployment, these recommendations were not consistently applied.

Group security is heavily dictated by who is permitted to interact with the bot, as open group policies or public deployments dramatically increase exposure. Any member or newly added attacker can send crafted prompts that trigger a chain of tool calls, data reads, and configuration changesall without explicit admin approval.

Without a strict allowlist for specific tool categories (such as group:fs or group:runtime), the agent can even act as a Trojan Horse, following instructions to enter servers with wider permissions and bypassing existing Discord security settings. Much like over-privileged integrations in Model Context Protocol (MCP) environments, a group session with broad tool access effectively becomes a remote control surface for both your infrastructure and your privacy.

Prompt injection: the Trojan horse inside “private” AI agents

One of the most dangerous lessons is that openclaw prompt injection does not require a direct malicious user; it can be triggered by external content. Even if an agent is used only by trusted employees, it ingests data from web searches, emails, and third-party "skills"

Any of these can carry adversarial instructions embedded in otherwise benign‑looking content. Just as MCP web/doc connectors can be abused to trigger sensitive tools, OpenClaw’s connectors do not block the injected prompts to:

  • Ask the agent to summarize an email, then quietly exfiltrate the last five messages to an attacker address.
  • Load a configuration file and selectively leak environment variables one by one.
  • Change routing so the bot joins an attacker‑controlled group and streams internal updates there.

The OpenClaw case also mirrors known MCP attack patterns: over‑privileged tools, plaintext secrets, and no systematic testing for indirect prompt injection.

Lessons learnt from “fun bot” to enterprise data breaches

Some of the OpenClaw incidents mentioned above are caused by the design choices of the system - it aims at trading convenience and AI abilities against the privacy. But it can also reflect what kind of common mistakes that a general AI agent deployment can make.

A vulnerable AI agent deployment can lead to:

  • Credential theft: API keys, OAuth tokens and environment variables read by tools or visible in logs and URLs.
  • Cross‑user privacy breaches: Conversations, uploaded files and notes from one user or group resurfaced in another “isolated” session.
  • Configuration tampering: Agents joining new channels, changing tool policies or enabling additional capabilities without explicit admin approval.

These failures mirror documented AI incidents where unsecured agents leaked emails, internal documents or fabricated legal precedents that reached court filings.

At organizational scale, similar weaknesses could cause:

  • Data confidentiality breaches across CRM systems, legal archives, support conversations and source code repositories.
  • Business and competitive damage if pricing models, customer lists or roadmaps are exfiltrated and weaponized by competitors or scammers.
  • Regulatory and reputational fallout when AI‑mediated leaks or hallucinations undermine customer trust or attract regulator scrutiny.

The lesson is the same across MCP agents, legal assistants and OpenClaw: once AI sits in the middle of critical workflows, traditional perimeter security and one‑off audits are no longer enough.

{{cta}}

How to prevent the AI vulnerabilities in OpenClaw

Preventing another OpenClaw incident requires both hardening OpenClaw itself and continuously testing AI agents with adversarial probes.

1. Harden your OpenClaw deployment

Start by applying OpenClaw’s own security guidance aggressively.

Control UI and gateway

  • Put the Control UI behind private network and enable recent version of TLS; never expose it directly to the internet.
  • Avoid tokens in query parameters; prefer headers or short‑lived pairing codes and rotate credentials regularly.

Session and workspace isolation

  • Use per-peer or per-account-channel-peer for DMs whenever more than one person can talk to the bot.
  • Treat main as a single‑user mode only and keep secrets out of shared contexts.
  • For groups and channels, enable sandboxing with mode: "non-main", scope: "session" and workspaceAccess: "none" by default.

Tool least privilege

  • Maintain strict tool allowlists for sandboxed sessions; allow only messaging and session‑management tools in untrusted rooms.
  • Deny filesystem, runtime, gateway and admin‑level tools by default for any session that can receive untrusted input.

Group and DM policies

  • Avoid dmPolicy="open" and open group policies for agents that can touch sensitive systems.
  • Use pairing codes, allowlists or manual approval flows to limit who can interact with high‑privilege bots.

These principles mirror MCP configuration best practices: least privilege, secret management, and treating every connector as a potential remote code execution path.

2. Treat all content as untrusted

Assume every document or email contains a payload. Use read-only "reader" agents to process content before passing summaries to tool-enabled assistants. Inject secrets via environment variables rather than prompts.

3. Continuously red team your agents with Giskard

Even a well‑configured OpenClaw instance can hide subtle vulnerabilities that only appear in real conversations. Giskard addresses this with automated AI red teaming:

  • Connect your agent: Wrap your OpenClaw‑based assistant (or MCP host) so Giskard can converse with it as an external user.
  • Launch adversarial probes: Specialized attacking agents perform multi‑turn conversations, combining prompt injection, tool‑abuse and cross‑session leakage attempts tailored to your use case.
  • Targeted probes for agents:
    • Agentic tool extraction to map available tools and parameters.
    • Excessive‑agency probes that try to trigger unauthorized tool use.
    • Prompt injection and Chain‑of‑Thought forgery attacks to bypass naive guardrails.
    • Internal information exposure tests for plaintext API keys and env vars.
    • Cross‑session leakage probes that simulate one user retrieving another’s data.
  • Review and remediate: Giskard produces detailed attack traces categorized by risk (including OWASP LLM risks), making it straightforward for teams to fix misconfigurations, harden prompts and tighten tool scopes.

By embedding these scans, you get continuous validation that configuration changes, new skills or updated models have not reopened OpenClaw‑style holes.

From viral agents to trustworthy AI systems

OpenClaw proves that agentic AI can be incredibly useful, and that, without security‑first design and systematic testing, it can fail catastrophically.

Teams deploying AI agents like OpenClaw, i.e. with strong tools through MCP or skills, now face a clear choice:

  • Ship fast and hope no one sends the wrong email, prompt or plugin to trigger and exploit the vulnerabilities.
  • Or invest in robust configuration, sandboxing, monitoring, online guardrails and continuous AI red teaming before agents hit production.

We invite you to explore the 50+ most common adversarial probes in AI security.

For teams looking to take the next step on how to detect security issues in your public chatbots, you can reach out to the Giskard team.

Continuously secure LLM agents, preventing hallucinations and security issues.
Book a Demo

You will also like

Model Context Protocol: Understanding MCP security risks and prevention methods

Model Context Protocol: Understanding MCP security risks and prevention methods

MCP servers pose significant MCP security risks due to their ability to execute commands and perform API calls. Even "official" Model Context Protocol setups face MCP vulnerability and tool poisoning. In this article, you'll learn the primary MCP security issues, from injection attacks to supply chain risks, and how to mitigate these MCP cyber security threats for your AI agents.

View post
Agentic tool extraction: Multi-turn attack that exposes the agent's internal functions

Agentic tool extraction: Multi-turn attack that exposes the agent's internal functions

Agentic Tool Extraction (ATE) is a multi-turn reconnaissance attack to extract complete tool schemas, function names, parameters, types, and return values. ATE exploits conversation context, using seemingly benign questions that bypass standard filters to build a technical blueprint of the agent's capabilities. In this article, we demonstrate how attackers weaponize extracted schemas to craft precise exploits and explain how conversation-level defenses can detect progressive extraction patterns before tool signatures are fully exposed.

View post
Cross Session Leak: when your AI assistant becomes a data breach

Cross Session Leak: when your AI assistant becomes a data breach

Cross Session Leak is a data exfiltration vulnerability where sensitive information from one user's session bleeds into another user's session, bypassing authentication controls in multi-tenant AI systems. This occurs when AI architectures fail to properly isolate session data through misconfigured caches, shared memory, or improperly scoped context. This article explores how cross session leak attacks work, examines a healthcare scenario, and provides technical strategies to detect and prevent these vulnerabilities.

View post
Get AI security insights in your inbox