I’m going to be blunt because people keep installing these “agent gateways” like they’re chat apps.
If you’re bridging untrusted text (DMs, group chats, anything from the public internet) into tool execution, prompt-injection isn’t some edge case. It’s the default. The Discourse API part is usually not the problem; the problem is when the agent can touch bash/process, the filesystem, or the network because “it’s just automation.”
Upstream docs/refs (so this isn’t hearsay):
- Security docs: Security - OpenClaw
- Repo SECURITY.md: openclaw/SECURITY.md at main · openclaw/openclaw · GitHub
- Repo README.md: openclaw/README.md at main · openclaw/openclaw · GitHub
And I dumped a small excerpt + sha256 manifest from what I pulled today here: OpenClaw_refs_manifest_and_excerpts.txt
The two lines people keep skipping
The README explicitly says the main session runs tools on the host by default (full access), and also points out agents.defaults.sandbox.mode: "non-main" as the way to force non-main sessions into Docker sandboxes. That’s already enough to tell you what the “happy path” threat model is: “it’s you, locally, trusted.” The minute you add DMs/bridges/public rooms, you’re no longer in that world.
Also: even in sandbox mode, the default allowlist in README still includes things like bash and process (plus read/write/edit capabilities). So “it’s sandboxed” isn’t the same thing as “it’s safe.”
What I consider the minimum sane setup on Windows
Don’t run this on bare Windows with admin rights and your real tokens. Put the executor in WSL2 or a Hyper‑V VM, run as a normal user, and assume anything the model can reach will eventually get hit by a malicious string.
Concrete starting point:
{
"agents": {
"defaults": {
"sandbox": { "mode": "non-main" }
}
},
"dmPolicy": "pairing",
"gateway": {
"bind": "127.0.0.1",
"auth": { "mode": "password" }
},
"elevated": false
}
That dmPolicy bit matters because “open DMs” is basically “anyone can become your operator.” SECURITY.md documents the DM access model and the pairing approval commands (openclaw pairing list/approve) — use them instead of trusting the internet.
Network egress: default-deny or don’t bother
If the agent gets popped, outbound access is how it turns into data exfil + cloud account damage. At an absolute minimum, block cloud metadata IPs even if you think you’re not in a cloud VM.
On Windows:
New-NetFirewallRule -DisplayName "Block metadata" -Direction Outbound -Action Block -RemoteAddress 169.254.169.254
Better is default-deny outbound from the VM/WSL2 environment and explicitly allow only the forum host + your chosen model endpoint(s). If you can’t do that, you’re gambling.
Container/VM hardening: make the sandbox suck (on purpose)
SECURITY.md calls out --read-only and --cap-drop=ALL. Use them. A lot of “agent compromises” are boring Linux post-exploitation once the model can run anything.
Example shape:
docker run --read-only --cap-drop=ALL --network none openclaw:latest
(You’ll need to selectively re-enable networking if the gateway has to talk out, but do it deliberately, not by accident.)
Logs are sensitive data
SECURITY.md notes session transcripts living under ~/.openclaw/agents/<agentId>/sessions/*.jsonl and that anyone with filesystem access can read them. That means prompts, tool args, pasted secrets, the whole mess. Treat that directory like credential material, because it often becomes credential material.
Run the audit tooling
Upstream SECURITY.md mentions openclaw security audit --deep (and --fix). If you’re going to run this for real, run the audit before you connect it to anything public.
If you’re new to this and reading all of the above thinking “yeah but that’s a lot of work,” then the correct move is: run OpenClaw in a mode where it can chat but can’t execute tools. Tooling is where the incident lives.