The moment this project went from "fun weekend hack" to something I actually use every day was when I got the MCP server working. Claude Code on my laptop sends a prompt to the orchestrator sitting under my desk, which boots a VM, runs Claude Code inside it with full permissions, and streams the results back. Claude delegating work to Claude.

It's a weird feeling watching it happen. You're in a conversation with Claude, it decides a task needs isolation, calls the MCP tool, and a few seconds later you can see a fresh VM spinning up in the dashboard. Like having an intern who can clone themselves.

Part 1 covered why I built this. Part 2 was the guts of it — rootfs, networking, the guest agent. This last post is about the interfaces, the streaming pipeline, and what I'd change if this needed to work for more than just me.

The MCP server

The orchestrator exposes an MCP server with eight tools. The main one is run_task — give it a prompt, optional config (RAM, vCPUs, timeout, max turns), and it blocks until the task completes. Returns the task ID, status, exit code, result files, cost, and the output truncated to 4000 characters.

Two transport modes. Stdio for when Claude Code runs on the same machine:

{
    "mcpServers": {
        "orchestrator": {
            "command": "sudo",
            "args": ["/opt/firecracker/bin/orchestrator", "mcp"]
        }
    }
}

And Streamable HTTP for network access — Claude Code on any machine on the LAN can use it:

{
    "mcpServers": {
        "orchestrator": {
            "type": "http",
            "url": "http://192.168.50.44:8081/mcp"
        }
    }
}

The other tools are for poking around: get_task_status, list_vms, exec_in_vm (run a command in a still-running VM), read_vm_file, destroy_vm, list_task_files, and get_task_file. That last one is smart about content types — text files come back as plain text, images come back as base64 MCP image content so Claude can actually see screenshots the VM took.

if isImageMime(mimeType) {
    encoded := base64.StdEncoding.EncodeToString(data)
    return mcplib.NewToolResultImage("Screenshot from task "+taskID, encoded, mimeType), nil
}

The migration that broke everything

This bit is worth telling because it'll save someone else the debugging time.

I originally built the MCP server with mcp-go v0.45.0 using SSE (Server-Sent Events) transport. Worked great. Then Claude Code updated to expect the newer Streamable HTTP transport, and everything fell over.

The failure mode was confusing. Claude Code would try to connect, attempt OAuth discovery against the /sse endpoint, get a 404 (my server doesn't do OAuth), and fail with:

Error: HTTP 404: Invalid OAuth error response: SyntaxError: JSON Parse error: Unable to parse JSON string

Nothing in my code changed. The client just started speaking a different protocol.

The fix was small once I understood it:

// Before — SSE transport
func (s *Server) ServeSSE(addr string) error {
    sseServer := server.NewSSEServer(s.mcpServer,
        server.WithBaseURL("http://"+addr),
    )
    return sseServer.Start(addr)
}

// After — Streamable HTTP transport
func (s *Server) ServeHTTP(addr string) error {
    httpServer := server.NewStreamableHTTPServer(s.mcpServer,
        server.WithEndpointPath("/mcp"),
        server.WithStateLess(true),
    )
    return httpServer.Start(addr)
}

Bumped mcp-go from v0.45.0 to v0.46.0, swapped the server constructor, changed the endpoint from /sse to /mcp, updated the client config. Done. But diagnosing "OAuth error on a server that doesn't do OAuth" — that bit took a while.

Output streaming

When Claude Code runs inside a VM, its output needs to get from stdout inside the guest all the way to a browser tab on my laptop. The path:

flowchart LR
    A["Claude Code stdout"] --> B["Guest agent\nvsock frame"]
    B --> C["Host vsock client\nExecStream"]
    C --> D["Task runner\nOnEvent callback"]
    D --> E["Stream Hub\nring buffer + fan-out"]
    E --> F["WebSocket\nto browser"]

The stream hub (internal/stream/hub.go) is a per-task pub/sub system. Each task gets a stream with a 1000-event ring buffer. When a WebSocket client connects, it gets all the buffered history first, then live events as they arrive.

Fan-out is non-blocking:

for ch := range s.subscribers {
    select {
    case ch <- event:
    default:
        // Subscriber is slow, drop the event
    }
}

A slow WebSocket client can't block the task runner. If the browser can't keep up, it misses events. In practice this never happens because the bottleneck is always Claude thinking, not the network.

The web dashboard

The React frontend is compiled to static files and embedded into the Go binary:

//go:embed all:web-dist
var webDistEmbed embed.FS

Single binary deployment. No nginx, no separate frontend server, no CORS headaches in production. The API server falls through to index.html for unknown paths, which gives you SPA client-side routing.

The most interesting page is the task detail view. Claude Code's --output-format stream-json spits out one JSON object per line — thinking blocks, text responses, tool calls, tool results, cost summaries. The dashboard parses these into coloured blocks:

  • Purple for thinking (Claude's internal reasoning)
  • Blue for text responses
  • Orange for tool calls (shows the tool name and input)
  • Grey for tool results (truncated to 2000 chars — some of these are enormous)
  • Green for the final result with cost

A useWebSocket hook connects when the task is running and disconnects when it's done. Green pulsing dot for live streaming. Auto-scroll to the bottom as events arrive. Image files in the results get inline previews pointing at the API's file download endpoint — so when Claude takes a screenshot inside the VM, you see it immediately.

Dark theme. Orange accents. Obviously.

What productionising looks like

This runs on one box with no auth. It's a home lab project. But the gap between "works for me" and "works for a small team" isn't as big as it looks.

Persistence is the most obvious one. The task store is an in-memory Go map. Orchestrator restarts? All task history gone. VM metadata already persists to disk and gets recovered on startup — tasks should too. SQLite or bbolt, a few hours of work. I just haven't needed it because I don't restart the process very often.

Task queue with backpressure. Right now tasks fire as goroutines with no concurrency limit. Submit 20 tasks on a 30GB machine where each VM wants 2GB and the last few fail because there's no memory left. A buffered channel or semaphore would fix this. You could get fancier with priority queues — quick code generation tasks ahead of long research tasks — but even a simple concurrency cap would be enough.

Authentication. The REST API and MCP server accept requests from anyone who can reach the port. For a team: API keys at minimum, mTLS if you're serious about it. The MCP spec supports auth flows now — that'd be the right way to do it for the MCP endpoint.

The OnEvent callback race. This one's a latent bug. The task runner's OnEvent callback is stored on the runner struct, not passed per-task:

s.taskRunner.OnEvent = func(id string, event agent.StreamEvent) {
    taskStream.Publish(event)
}
s.taskRunner.Run(context.Background(), t)

Two simultaneous tasks overwrite each other's callbacks. It works today because MCP tasks block (one at a time) and the API handler sets up the stream before the goroutine runs. But it's the kind of thing that works until it doesn't. Fix is trivial — pass the callback into Run() as a parameter.

Graceful shutdown. There's no signal handler. Ctrl-C kills the process, running VMs become orphans. They keep running as Firecracker processes — the recoverState() function on next startup finds them and starts tracking them again — but their tasks are lost. A proper signal handler would stop accepting new tasks, wait for running ones to finish with a timeout, then tear everything down cleanly.

For real multi-user you'd want result storage on S3 or R2 instead of local disk. A web auth layer. Per-user credential vaults so different people's Claude tokens don't mix. Usage tracking and cost attribution.

What I wouldn't change: the single-binary deployment, vsock for host-guest communication, ephemeral VMs as the isolation model, the embedded frontend. Those are the right calls regardless of scale. The architecture is sound — it's the operational bits around it that need work.

Most of these are a weekend each. The project is about 3,200 lines of Go and 860 of TypeScript. It's not a big codebase. Adding persistence, auth, and a task queue would maybe take it to 4,500 lines. Still fits in your head.

For now, it sits under my desk and boots VMs when I ask it to. Claude delegating to Claude, in complete isolation, on hardware I own. That's enough.