Most codebases grow by adding things. More abstractions, more dependencies, more config files. NanoClaw does the opposite — it replaces a 500,000-line AI assistant framework with roughly 8,000 lines of TypeScript and six production dependencies.
That's not a flex about line count. It's the result of six architectural bets that are genuinely interesting, and honestly, patterns I wish I'd seen more teams adopt over the years.
I pulled the entire codebase apart to understand how it works. Here's what I found — and why these patterns matter way beyond a personal AI assistant.
The credential proxy
This is the one that properly impressed me.
NanoClaw runs AI agents inside Linux containers. Those agents need API keys to talk to Anthropic. The obvious approach is to pass the key as an environment variable. That's what most of us do, right? Stick it in an env var, maybe pull it from a secrets manager at startup, move on.
NanoClaw doesn't do that. Instead, every container gets a placeholder API key and a base URL pointing at a localhost proxy on port 3001. The proxy sits on the host, intercepts every outbound API request, strips the placeholder, and injects the real credential before forwarding upstream.
sequenceDiagram
participant Agent as Agent (Container)
participant Proxy as Credential Proxy (Host :3001)
participant API as Anthropic API
Agent->>Proxy: POST /v1/messages<br/>x-api-key: "placeholder"
Proxy->>Proxy: Strip placeholder<br/>Inject real API key
Proxy->>API: POST /v1/messages<br/>x-api-key: "sk-ant-..."
API-->>Proxy: Response
Proxy-->>Agent: Response
The agent literally cannot leak the real key — it never has it. Even if a prompt injection attack tricks the agent into dumping its environment variables, all you get is ANTHROPIC_API_KEY=placeholder. There's nothing to exfiltrate.
This pattern now has a proper name — the Phantom Token Pattern. It's being adopted as a sidecar proxy in Kubernetes for production AI agent deployments. NanoClaw arrived at the same design independently, which tells you something about how natural this solution is once you frame the problem correctly.
The proxy also handles two auth modes (API key and OAuth) and binds differently per platform — 127.0.0.1 on macOS/WSL2, the docker0 bridge IP on Linux. Secrets are loaded into a plain object, never into process.env, so they can't leak to child processes.
Container isolation as authorization
At Vend and Xero I spent a lot of time thinking about authorization. Role-based access, permission checks, middleware that decides what each user can do. It works, but it's a constant source of bugs — one missed check and you've got an escalation vulnerability.
NanoClaw takes a completely different approach. Instead of checking what an agent is allowed to do, it controls what the agent can see.
graph TB
subgraph "Non-main group container"
A["/workspace/group/ — Writable"]
B["/workspace/global/ — Read-only"]
C["/workspace/ipc/ — Writable"]
end
subgraph "Main group container"
D["/workspace/project/ — Read-only"]
E["/workspace/project/.env — Shadowed to /dev/null"]
F["/workspace/group/ — Writable"]
G["/workspace/ipc/ — Writable"]
end
subgraph "Host filesystem"
H["Project root"]
I[".env (real secrets)"]
J["Group folders"]
K["IPC directories"]
end
H --> D
I -.->|blocked| E
J --> A
J --> F
K --> C
K --> G
Each container gets a specific set of mounts. A non-main group container can only see its own folder — it physically cannot access the project root, other groups' data, or host files. The main group gets read-only access to the project, but .env is shadow-mounted to /dev/null. Even with the full project mounted, secrets are invisible.
The agent inside the container runs with bypassPermissions — it can use Bash, write files, do whatever it wants. But "whatever it wants" is constrained by what the OS lets it see. No application-level permission checks needed.
Additional mounts are validated against a blocklist (.ssh, .gnupg, .aws, credentials, etc.) that lives outside the project root, making it tamper-proof from inside any container. Symlink resolution prevents path traversal attacks like creating /tmp/innocent -> ~/.ssh.
The security model is the filesystem topology itself.
The two-cursor system
Message processing in NanoClaw uses two independent cursors — and the interaction between them is the cleverest bit of the whole system.
Cursor 1 is a global watermark. The main polling loop reads new messages from SQLite every 2 seconds. When it sees messages, it advances the watermark immediately and persists it. This prevents the same batch from being processed twice if the loop runs again before the agent finishes.
Cursor 2 is per-group. When a group's messages are about to be processed, the per-group cursor advances optimistically before the agent starts. If the agent fails:
- If nothing was sent to the user yet — roll back the cursor, retry later
- If a response was already sent — keep the cursor advanced, because rolling back would cause duplicate messages
flowchart TD
A[New messages arrive] --> B[Cursor 1: Advance global watermark]
B --> C[Persist to SQLite]
C --> D[Cursor 2: Advance per-group cursor optimistically]
D --> E[Run agent in container]
E --> F{Agent succeeded?}
F -->|Yes| G[Keep cursor position]
F -->|No| H{Output already sent to user?}
H -->|Yes| I[Keep cursor — avoid duplicates]
H -->|No| J[Roll back cursor — safe to retry]
This gives you at-most-once delivery to the user and at-least-once processing for the agent. That's the right tradeoff for a messaging system — users should never see duplicate messages, but retrying agent work is fine.
I've built billing platforms where we had to think about exactly these kinds of delivery guarantees. Getting this wrong in a payments context means double-charging customers. NanoClaw's approach is clean — two cursors, one rule: once you've spoken to the user, you can't unsay it.
File-based IPC
No Redis. No RabbitMQ. No gRPC. NanoClaw's inter-process communication is JSON files on the filesystem.
Containers write outbound messages to /workspace/ipc/messages/. The host writes follow-up messages to /workspace/ipc/input/. Both sides use the same pattern — write to a .tmp file, then rename it into place. rename is atomic on POSIX filesystems. The file either exists with complete content or doesn't exist at all.
graph LR
subgraph Container
A[Agent writes response] --> B["Write to .tmp file"]
B --> C["Atomic rename to .json"]
end
subgraph Host
D["IPC Watcher 1s poll"] --> E["Read .json files"]
E --> F{Source group?}
F -->|Main group| G["Send to any target"]
F -->|Other group| H["Send only to self"]
end
C --> D
Authorization is enforced by directory identity. Each group has its own IPC directory (/data/ipc/{group}/). The host knows which group wrote a file based on which directory it appeared in. A container can't spoof its identity because its mount configuration fixes which directories it can write to.
The main group can send messages to any other group. Non-main groups can only send to themselves. This isn't enforced by application logic checking tokens — it's enforced by the host reading from a directory that only one container can write to.
This is the same atomic write pattern used in crash-safe database implementations. The key constraint — temp file and target must be on the same filesystem — is guaranteed because both live inside the same container mount.
Polling over events
The main loop polls SQLite every 2 seconds. The IPC watcher polls the filesystem every 1 second. The task scheduler polls every 60 seconds.
In a system designed for one user, this eliminates entire categories of race conditions. No WebSocket connection management, no event ordering guarantees to maintain, no callback hell. Just: check for new stuff, process it, sleep, repeat.
I've been on teams that reached for event-driven architectures way too early. At Xero we shipped multiple times a day — but the internal tooling that supported that? Half of it would've been simpler with a poll loop. When you know your scale ceiling, polling is a feature, not a limitation.
Recompilation over plugins
Each container recompiles TypeScript on startup. The agent-runner source is mounted from a per-group directory, so each group can have customised agent behaviour. The compiled output goes to /tmp/dist which is then made read-only — the agent can't modify its own runner.
No plugin registry. No abstraction layer. No dependency injection. You want different behaviour per group? Change the source files. The container picks it up on next startup.
This is slower than a plugin system, sure. Container startup pays a compilation tax. But it eliminates an entire layer of indirection — and that layer is usually where the bugs live.
What ties it all together
These aren't random choices. They're all expressions of the same principle: know your constraints, and use them to delete complexity.
NanoClaw is a single-user system. It knows this. So it polls instead of pushing, uses the filesystem instead of a message queue, and isolates with containers instead of writing authorization middleware.
The interesting question isn't whether these patterns work for NanoClaw — they obviously do. The question is: where do these same constraints exist in your stack? Because they're more common than you think.
That's what Part 2 is about.
Comments