The Agent That Teaches Itself

52,000 GitHub stars in two months. Seven releases in four weeks. An issue tracker north of 7,400 and climbing by the dozen daily.

Hermes Agent is from Nous Research — the people behind the Hermes series of fine-tuned LLMs. That matters. This isn't some random agent framework from a dev tools startup. It's built by a lab that trains the actual models. And when you dig into the repo, you can see they're playing a different game to everyone else in this space.

Hermes Agent homepage — dark, moody, very on-brand for an AI research lab

What it actually does

The pitch is "an agent that grows with you." Sounds like marketing, but the mechanism is real.

When Hermes completes a task successfully — and you don't correct it or redo the work — it extracts a reusable skill from that interaction. Not a chat log or a raw transcript. A structured SKILL.md document with activation conditions, the approach that worked, and supporting files. Next time it sees a similar task, it pulls the relevant skill instead of reasoning from scratch.

It maintains persistent memory across sessions too. A MEMORY.md for environment context and lessons learned. A USER.md for your preferences and decision history. SQLite with full-text search across every past conversation. And a user modelling system (via Honcho) that builds a deepening profile of who you are and how you work.

Whether this loop actually compounds into something useful depends on volume. The community consensus seems to be that benefits kick in after 20-30 tasks in a domain. Before that, you're basically running a normal agent with extra overhead.

Not a coding assistant

This is where most people's mental model breaks. Hermes Agent isn't competing with Claude Code or Cursor. Different animal.

The headline feature is the messaging gateway. One long-running process that connects to Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, WeChat, Home Assistant, and more. Fifteen-plus platform adapters from a single AIAgent class. You can have the same agent running your home automation, answering Telegram messages, and scheduling cron jobs — all from one service on a $5 VPS.

There's a CLI too, and an ACP adapter for VS Code, Zed, and JetBrains. But the gateway is the differentiator. If you want an always-on AI assistant that lives where your team already communicates rather than inside a terminal, this is the play.

The docs are comprehensive and well-structured — Docusaurus with dark theme

Some smart architecture choices

One design decision stood out when I read through the codebase. Every entry point — CLI, gateway, ACP adapter — shares the same AIAgent class. Platform differences live in the entry point, not the core. Clean separation, and it means the agent behaves consistently whether you're talking to it over Telegram or typing in a terminal.

The tool system is extensive. 48 tools across 40 toolsets: terminal execution across six backends (local, Docker, SSH, Daytona, Singularity, Modal), browser automation with an anti-detection browser, web scraping, code execution, image generation, text-to-speech, vision.

Subagent delegation is well thought through. Children get fresh conversations, their own iteration budgets, restricted toolsets, and they're blocked from recursive delegation, memory writes, and cross-platform messaging. Max three concurrent. Opinionated in a way that prevents the obvious footguns.

There's a context compression system that protects the head and tail of conversations while summarising the middle turns. And a memory fencing mechanism that wraps recalled context in XML tags telling the model "this is background data, not new user input" — a subtle but important distinction that stops the model treating old memory as fresh instructions.

These are decisions from people who spend a lot of time thinking about how LLMs actually behave at runtime.

The research flywheel

This is the bit that makes Hermes different from everything else on this list.

The repo includes RL training environments integrated with Atropos and Tinker, with tool call parsers for DeepSeek, GLM, Kimi, Llama, Mistral, Qwen, and their own Hermes format. Nous Research is using this agent to generate training data for future models. The agent gets better at tasks, those task completions feed back into model training, better models make a better agent.

That's not a product feature. That's a flywheel. And it's something no pure-product company can replicate with an open-source agent, because they don't publish their training pipelines.

How it stacks up

The agent framework space has exploded in the last year. Most of them fall into one of two camps: coding agents that live in your IDE, and multi-platform agents that live across messaging apps. Hermes sits firmly in the second camp but with the learning loop as its wedge.

	Hermes Agent	OpenClaw	NanoClaw	Claude Code	Cline	Aider
Stars	52k	354k	27k	112k	60k	43k
Language	Python	TypeScript	TypeScript	Proprietary	TypeScript	Python
License	MIT	MIT	MIT	Proprietary	Apache 2.0	Apache 2.0
Focus	Multi-platform agent	Multi-platform agent	Multi-platform agent	Coding agent	Coding agent	Coding agent
LLM support	200+ via OpenRouter	200+ via OpenRouter	Claude primarily	Claude only	Multi-provider	Multi-provider
Self-improving	Native learning loop	ClawHub skills	CLAUDE.md files	CLAUDE.md files	MCP tools	No
Messaging	15+ platforms	50+ platforms	5 platforms	Terminal only	VS Code only	Terminal only
IDE integration	Basic (ACP)	Experimental	None	Excellent	Excellent	Watch mode
Self-hostable	Yes	Yes	Yes	No	Yes	Yes
Age	~2 months	~5 months	~2.5 months	~14 months	~21 months	~35 months

OpenClaw is the gorilla. 354k stars, 50+ messaging adapters, a massive community skills marketplace (13,700+ skills on ClawHub). But its security track record is rough — 138 CVEs documented in 63 days, 42,000+ instances found exposed on the public internet, and malicious skills turning up in the supply chain. If you need the biggest ecosystem and broadest platform support, OpenClaw is it. If you care about security, maybe hold off.

NanoClaw went the opposite direction — deliberately minimal, container-isolated, security-first. About 15 source files total. It runs each agent in its own container, which solves the RCE problem that plagues OpenClaw. The trade-off is it's Claude-only, has no plugin marketplace, and the ecosystem is tiny. For people who want a locked-down Telegram bot and nothing else, it's a solid pick.

Claude Code and Cline are coding tools, full stop. Best-in-class IDE integration, but no messaging gateway, no always-on service, no multi-platform support. They're solving a completely different problem. Comparing Hermes to Claude Code is like comparing a Swiss Army knife to a scalpel — different tools for different jobs.

Aider deserves a mention as the most mature terminal coding agent (35 months old, rock solid), but it's coding-only with zero general agent capabilities.

The honest takeaway from this chart: Hermes doesn't win on any single row. It doesn't have OpenClaw's platform breadth, NanoClaw's security posture, Claude Code's IDE integration, or Aider's maturity. What it has is the only native learning loop in the bunch, and the only research lab behind it that's feeding agent behaviour back into model training. That's a bet on trajectory rather than current state.

The cracks

Seven releases in four weeks is a pace that should make anyone nervous.

The v0.8.0 release introduced a gateway crash on every incoming message — a request_overrides None reference that broke all messaging platforms. Multiple people reported it the same day. That's what happens when you ship 209 merged PRs in a single release.

The security situation is worse. On the day I looked, multiple security issues had been filed: missing security headers, plaintext SQLite storage, ReDoS risk, session fixation, no rate limiting on the API server. And the big one — unauthenticated remote code execution via the SMS webhook because Twilio signature validation wasn't implemented. For a tool that can run terminal commands on your machine, that's not a minor oversight.

The codebase shows the strain too. run_agent.py is 9,200 lines. cli.py is 8,500. gateway/run.py is 7,500. These are enormous single files. The docstrings acknowledge it — noting these modules "were previously embedded in a 3,600-line run_agent.py" — so there's been refactoring, but the core files are still massive.

86% of all commits come from one person (teknium1). That's a concentration risk for a project with 52K stars and people running it as an always-on service.

Who should pay attention

If you want an AI coding assistant, use Claude Code or Cline. Hermes isn't that.

If you want an always-on AI agent that connects to your messaging platforms, learns from its own completions, runs on your own infrastructure, and works with whatever LLM provider you prefer — Hermes is the most ambitious attempt at that in open source right now.

Self-hosting enthusiasts will love it. Researchers who want to study agent behaviour with their own models will love it. People who want their AI assistant in Telegram rather than a terminal will love it.

But you're adopting a two-month-old project that's iterating at breakneck speed, has real security gaps, and depends heavily on a single maintainer. The trajectory is impressive. Whether the foundations can hold up under this growth is the open question.

If Nous Research can slow down just enough to shore up the security model and break apart those massive files, this could end up being the definitive open-source agent framework. Right now it's a rocket that's still being welded together mid-flight. Thrilling to watch. Maybe wait a version or two before you point it at anything sensitive.