Hermes Agent: The Self-Improving Challenger to OpenClaw

Last month I wrote about the claw ecosystem — OpenClaw, NanoClaw, PicoClaw, ZeroClaw, and the managed variants. The post covered the security crisis, the alternatives, and which framework to pick depending on your priorities.

Since then, a new contender has entered the space. Not another Claw fork or rewrite — a fundamentally different approach to what an AI agent should be.

Hermes Agent by Nous Research launched on February 25, 2026. In under two months, it has accumulated 82,000+ GitHub stars, 11,000 forks, and contributions from 320+ developers. The latest release, v0.9.0, shipped on April 13 with mobile support, a local web dashboard, and the deepest security hardening pass yet.

Where the Claw ecosystem optimized for ecosystem breadth — more channels, more skills, more integrations — Hermes optimized for something else entirely: an agent that gets better the more you use it.

The Learning Loop#

This is what makes Hermes fundamentally different from every framework I covered in the Claw post.

When you complete a complex task with Hermes, the agent doesn’t just forget what it did. It autonomously creates a structured skill document describing the approach, the tools used, and the outcome. Next time a similar task appears, the agent references that skill for faster, more accurate execution. Over time, these skills compound — the agent builds a library of proven solutions specific to your projects and workflows.

The learning system has three layers:

Persistent memory. Two markdown files — MEMORY.md (environment info, lessons learned, system state) and USER.md (your preferences, work style, decisions) — are loaded at session start. An SQLite database with FTS5 full-text search enables recall across weeks of sessions. The memory snapshot is frozen at session start to preserve the LLM’s prefix cache, which Nous Research claims reduces token costs by 80–90% compared to loading full context every turn.

Autonomous skill creation. After solving a complex task, Hermes generates a skill document following the agentskills.io open standard. These skills are portable — they work across any framework that supports the standard, including OpenClaw.

User modeling. Honcho dialectic modeling builds a deepening profile of who you are across sessions. The agent learns your coding style, your preferred tools, your decision patterns, and adapts its behavior accordingly.

No Claw framework does this. OpenClaw skills are static files you write and maintain. NanoClaw uses per-session CLAUDE.md files. ZeroClaw has SQLite memory with FTS5 search but no autonomous skill creation. Hermes is the first framework where the agent actively improves itself through use.

Bottom line: The learning loop is Hermes’s core bet. If it works as advertised, an agent that’s been running for months should meaningfully outperform one you just installed. Whether the skill quality holds up at scale remains to be seen — the project is only two months old.

Architecture#

Language: Python (93%) | License: MIT | Stars: 82k+ | Created by: Nous Research

Hermes is a Python application that runs as a persistent process on your server. It exposes a terminal UI with multiline editing, slash-command autocomplete, and streaming tool output. A gateway process handles messaging platform connections, routing incoming messages to the agent.

Model flexibility#

No vendor lock-in. Hermes supports 200+ models through Nous Portal, OpenRouter, OpenAI, Anthropic, Xiaomi MiMo, z.ai/GLM, Kimi/Moonshot, MiniMax, Hugging Face, Ollama, vLLM, SGLang, or any custom endpoint. Switch models with hermes model — no code changes. Automatic failover chains handle provider errors.

Messaging platforms#

Telegram, Discord, Slack, WhatsApp, Signal, Email, iMessage (via BlueBubbles), WeChat, WeCom, Feishu/Lark, and CLI — 16 platforms total from a single gateway process. Not quite OpenClaw’s 50+, but covering the platforms most people actually use.

Execution model#

40+ built-in tools covering terminal access, file operations, web browsing, browser automation, vision, image generation, text-to-speech, and multi-model reasoning.

Hermes spawns isolated subagents with their own conversations, terminals, and Python RPC scripts for zero-context-cost pipelines. Natural language cron scheduling handles reports, backups, and briefings running unattended through the gateway.

Deployment#

Runs on a $5 VPS, a GPU cluster, or serverless infrastructure that hibernates when idle. The v0.9.0 release added Android/Termux support and a local web dashboard for managing settings, sessions, skills, and the gateway from a browser.

Security: Seven Layers Deep#

In the Claw post, I argued that security was the defining differentiator in the agent ecosystem. OpenClaw had 9 CVEs, 135,000+ exposed instances, and a supply chain attack that compromised 1 in 5 ClawHub packages. NanoClaw bet on container isolation. ZeroClaw bet on Rust and allowlists.

Hermes takes a defense-in-depth approach with seven security layers. Here’s how each one works.

1. Command approval#

Three configurable modes:

Manual (default): Always prompts before executing dangerous patterns — recursive deletes, privilege escalation, system config overwrites, SQL drops, pipe-to-interpreter chains (curl | sh), and process kills.
Smart: An auxiliary LLM auto-approves low-risk commands, auto-denies dangerous ones, and escalates uncertain cases to the user.
Off (YOLO mode): Disables all safety checks. Activated via --yolo flag, /yolo slash command, or environment variable.

Unanswered approval prompts default to denial after 60 seconds — fail-closed by design.

2. Tirith pre-execution scanner#

Before any command runs, Tirith scans for prompt injection, credential exfiltration, SSH backdoor patterns, homograph URL spoofing, and pipe-to-interpreter attacks. Auto-installs with SHA-256 verification. If Tirith is unavailable, execution proceeds by default (tirith_fail_open: true) — a configurable trade-off between availability and security.

3. Container isolation#

Six sandbox backends:

Backend	Isolation	Cmd Approval	Use Case
Local	None	Yes	Development
Docker	Container	Skipped	Production
SSH	Remote machine	Yes	Separate server
Singularity	Container	Skipped	HPC clusters
Modal	Cloud sandbox	Skipped	Scalable compute
Daytona	Cloud workspace	Skipped	Persistent dev envs

When running in Docker, containers launch with --cap-drop ALL, --security-opt no-new-privileges, process limits (--pids-limit 256), and noexec,nosuid tmpfs mounts. Command approval is skipped inside containers because the container itself provides the security boundary.

4. Credential management#

Environment variables containing KEY, TOKEN, SECRET, PASSWORD, CREDENTIAL, PASSWD, or AUTH are blocked by default in execute_code and terminal tools. MCP subprocesses receive only safe variables (PATH, HOME, USER, LANG, TERM, SHELL, TMPDIR, XDG_*) plus explicitly configured overrides. GitHub PATs, OpenAI keys, and bearer tokens are redacted from output.

5. Context file injection protection#

Hermes scans AGENTS.md, .cursorrules, and SOUL.md files for prompt injection, hidden instructions, credential theft patterns, and invisible Unicode characters before loading them into context.

6. SSRF protection#

Private networks (RFC 1918), loopback, link-local, CGNAT, cloud metadata hostnames, and reserved ranges are blocked. DNS failures are treated as blocked — fail-closed.

7. DM pairing authentication#

New messaging connections require an 8-character pairing code (32-char unambiguous alphabet) with 1-hour TTL. Rate-limited to 1 request per 10 minutes, max 3 pending codes, and 5 failed attempts triggers a 1-hour lockout. Pairing files are chmod 0600.

Bottom line: Hermes ships with more security layers than any Claw framework except ZeroClaw. The Tirith scanner and context file injection protection address threat vectors that no Claw framework handles. The weak spots are Docker running as root by default and tirith_fail_open: true as the default — both configurable, but insecure defaults remain insecure defaults.

Hermes vs OpenClaw#

	OpenClaw	Hermes Agent
Language	TypeScript	Python
GitHub Stars	345k+	82k+
License	MIT	MIT
Core Philosophy	Gateway — routing, permissions, channels	Learning loop — skills that improve over time
Skill Ecosystem	ClawHub (5,000+ static skills)	Auto-generated + agentskills.io
Memory Model	Manual (developer-maintained files)	Autonomous (FTS5 + user modeling)
LLM Providers	15+	200+ (via OpenRouter + direct)
Messaging Channels	50+	16
Sandbox	Docker (documented escapes)	6 backends + container hardening
Pre-exec Scanning	None	Tirith (injection, exfiltration, backdoors)
Supply Chain Risk	High (36% malicious skills in audit)	Low (no marketplace, conservative vetting)
Self-Improvement	None	Autonomous skill creation + refinement

Security Head-to-Head#

Readers of the Claw post asked me to compare security models more directly. Here’s how Hermes stacks up against OpenClaw on the dimensions that matter most.

Default network exposure. OpenClaw binds to 0.0.0.0:18789 by default — the single decision responsible for 135,000+ exposed instances. Hermes binds to localhost by default. The gateway requires explicit DM pairing with rate limiting and lockout. Advantage: Hermes.

Supply chain risk. OpenClaw’s ClawHub had 36% of audited skills containing prompt injection, with over 1,184 malicious packages in the ClawHavoc campaign. Hermes has no centralized skill marketplace. Skills are generated locally by the agent or manually installed. The agentskills.io standard is emerging but hasn’t had a comparable supply chain incident. Advantage: Hermes.

Pre-execution scanning. OpenClaw has no built-in command scanning — dangerous commands run if the user (or the LLM) approves them. Hermes’s Tirith scanner inspects every command for injection patterns, credential exfiltration, and backdoor signatures before execution. This is a capability no Claw framework offers. Advantage: Hermes.

Container isolation. OpenClaw’s Docker sandbox has documented escape vulnerabilities (CVE-2026-24763). Hermes Docker containers run with --cap-drop ALL, --security-opt no-new-privileges, process limits, and noexec tmpfs mounts. However, a community security review found that Hermes containers run as root by default with no USER directive, and retained DAC_OVERRIDE capabilities increase the blast radius of any in-container compromise. Neither approach is airtight — NanoClaw’s per-session container isolation and ZeroClaw’s Rust memory safety remain stronger primitives.

Credential management. OpenClaw historically stored credentials in plaintext configuration files. Hermes filters environment variables by pattern, redacts tokens from output, and mounts credential files read-only in containers. ZeroClaw still leads here with encrypted-at-rest secrets, but Hermes is a significant step up from OpenClaw.

Context file protection. Hermes scans project configuration files (.cursorrules, AGENTS.md, SOUL.md) for prompt injection before loading them into context. No Claw framework does this. This matters because prompt injection via project files is an increasingly common attack vector in autonomous agents.

Bottom line: Hermes is meaningfully more secure than OpenClaw out of the box. It’s not as hardened as ZeroClaw (Rust memory safety, encrypted secrets, workspace scoping) or as isolated as NanoClaw (container-per-session), but it addresses threat vectors — pre-execution scanning, context file injection, credential filtering — that no Claw framework handles.

Where Hermes Falls Short#

No framework is a free lunch. Here’s where Hermes has real weaknesses.

Smaller ecosystem. 82k stars is impressive for two months, but OpenClaw’s 345k stars and 5,000+ ClawHub skills represent a significantly larger community. If you need a pre-built integration for a niche messaging platform (IRC, Nostr, Twitch, Zalo), OpenClaw probably has it. Hermes probably doesn’t.

Messaging coverage gap. 16 platforms vs OpenClaw’s 50+. Hermes covers the mainstream channels well, but if your use case is multi-channel presence across every conceivable platform, OpenClaw is still the only option.

Docker root-by-default. The container runs as root with no USER directive. Combined with retained DAC_OVERRIDE capabilities, this weakens the container security boundary. Configurable, but the default should be non-root.

Tirith fail-open default. If the Tirith scanner is unavailable, commands execute without scanning. For an unattended gateway agent, this means a Tirith crash silently degrades the security posture. The default should be fail-closed.

Python dependency chain. Hermes is 93% Python. While this makes it accessible to contributors (Python is the lingua franca of ML/AI), it inherits Python’s dependency management challenges and lacks the memory safety guarantees of ZeroClaw (Rust) or the minimal dependency chain of PicoClaw (Go single binary).

Learning loop is unproven at scale. The autonomous skill creation is the core value proposition, but the project is two months old. How well do auto-generated skills hold up after six months of use? Do they accumulate noise? Does skill quality degrade as the library grows? These are open questions.

execute_code sandbox bypass. The execute_code tool includes terminal in its allowed tools, which can potentially bypass command approval. A known gap that undermines the otherwise thoughtful security model.

Updated Decision Framework#

My Claw ecosystem post ended with a decision framework. Here’s the updated version with Hermes in the picture.

You want an agent that improves over time → Hermes Agent. The learning loop, persistent memory, and autonomous skill creation are unique in this space. Use Docker or Modal backend for production isolation.

You want maximum integrations and ecosystem → OpenClaw, but invest heavily in hardening. The security situation hasn’t materially changed since last month.

You want something you can fully audit → NanoClaw. 500 lines of TypeScript, container isolation, Anthropic-only.

You want to run agents on constrained hardware → PicoClaw. Still the only option for $10 RISC-V boards.

Security is your primary concern → ZeroClaw. Rust memory safety and deny-by-default allowlists remain the strongest security primitive.

You don’t want to manage infrastructure → MaxClaw ($19/mo) or KimiClaw (browser-based).

You want both breadth and learning → Run both. Hermes and OpenClaw both support the agentskills.io standard. You can use OpenClaw for its gateway routing and channel breadth while using Hermes for its learning loop. The ecosystem is converging on shared abstractions that make this increasingly practical.

Final Thoughts#

The Claw ecosystem proved that people want autonomous AI agents. The security crisis proved that “move fast” without “secure by default” has real consequences.

Hermes represents a philosophical shift. The Claw frameworks all compete on the same axes — more channels, more skills, more integrations, better security. Hermes competes on a different axis entirely: an agent that gets better the more you use it.

This raises an interesting question that none of these frameworks have fully answered: who owns the learned knowledge? As agents accumulate understanding of your codebase, your workflows, and your decision patterns, that learned context becomes genuinely valuable. It’s not in the model weights — it’s in the skills, the memory files, the user profiles that Hermes builds over time. Today it lives on your server. But as managed hosting options emerge for Hermes (MiniMax already has a partnership), the question of data ownership and portability becomes critical.

The agentskills.io standard is a step toward portability — skills created by Hermes can theoretically run in any compatible framework. But the memory and user modeling data doesn’t have an equivalent standard yet.

The agent landscape is moving from “how many integrations can we support” to “how well can the agent learn.” Hermes is the first framework to make that bet explicitly. Whether it pays off depends on how well the learning loop scales — and that’s a question only time and real-world usage can answer.

References#