Anthropic just dropped Project Glasswing — a big collaborative cybersecurity initiative with a shiny new model called Claude Mythos Preview that can find zero-day vulnerabilities at scale. Twelve major tech companies involved. $100M in credits. Found a 27-year-old flaw in OpenBSD. Impressive stuff.

But let's be real about what's happening here. Anthropic trained a model so capable at breaking into systems that they decided it was too dangerous to release publicly. So they wrapped the release in a collaborative security initiative. The security work is genuinely valuable. But it's also a smart way to keep control of something they know is too powerful to let loose.

The part that actually matters, though, is who benefits. Glasswing is for the big players. The companies with security teams, budgets, and the kind of infrastructure that gets invited to sit at the table with AWS, Microsoft, and Palo Alto Networks. What about the rest of us? The startups, the small SaaS shops, the indie developers running production systems on a shoestring?

The internet is a dark forest. That's not a metaphor anymore — it's becoming the literal reality. Bots, scrapers, automated exploit chains, credential stuffing, AI-generated phishing. A server goes up and within hours it's being scanned, fingerprinted, and probed by systems that don't sleep. Visibility equals vulnerability. And AI is making the attackers faster, cheaper, and more autonomous every month.

The ISC2 put it plainly — both offence and defence now operate at speeds beyond human intervention. The threats aren't people sitting at keyboards anymore. They're autonomous systems running campaigns end-to-end.

So what do we do about it?

Offensive security — but not the kind you're thinking

When I say offensive security, I don't mean red-teaming or penetration testing. I mean giving your systems the ability to fight back.

Picture an LLM that sits across your centralised logs — network traffic, database queries, user interactions, access patterns — and builds an understanding of what normal looks like for your system over weeks and months. Not just pattern matching against known signatures. Actually understanding the shape of healthy behaviour.

When something breaks the pattern, it doesn't just alert. It acts.

Disable a compromised account. Kill a service that's behaving strangely. Block a database connection that shouldn't exist. Create an incident with full context for a human to review. The response is proportional and immediate — not waiting for someone to check their phone at 3am.

The architecture is pretty straightforward:

graph TD
    A[Application Logs] --> D[Secure Isolated Log Store]
    B[Network Traffic] --> D
    C[Database Queries] --> D
    D --> F[Baseline Health Model]
    E[User Activity] --> D
    F -->|Anomaly Detected| G[LLM Analysis]
    G -->|Analyse & Plan| H{Threat Assessment}
    H -->|Low| I[Alert & Log]
    H -->|Medium| J[Restrict & Escalate]
    H -->|High| K[Disable & Isolate]
    I --> L[Human Review]
    J --> L
    K --> L

The key is that the logging and analysis layer has to be isolated and secured separately from the systems it's watching. If an attacker can compromise the thing that's watching them, the whole model falls apart.

In practice that means separate infrastructure with its own auth boundary. Ingestion is write-only — your application services push logs in but can never read or modify what's already there. Append-only, immutable. The analysis layer gets scoped service accounts that can read logs, fire alerts, and pull specific emergency levers through a narrow API. Nothing else. If a compromised service tries to reach the log store directly, it hits a wall.

None of this is exotic. Centralised logging, immutable storage, scoped IAM — the building blocks exist. The hard part is wiring an LLM into that loop with the right constraints. Enough access to act, not enough to make things worse.

Adaptive, not rule-based

Traditional security tooling runs on signatures and static rules. Known bad patterns, blocklists, threshold alerts. That worked when threats were mostly human-paced. It doesn't work when you're up against autonomous systems that adapt faster than you can write rules.

The alternative is a system that learns what normal looks like for your environment — not a generic baseline, but the actual shape of healthy behaviour in your specific infrastructure. Traffic patterns, query frequencies, access timing, user behaviour. Weeks of observation before it starts making decisions.

When something breaks the pattern, the response is proportional. A sudden spike in unusual API calls might trigger deeper correlation — the system widens its search, pulls in more signals, lowers its threshold for flagging related activity. Repeated failed auth attempts from new IPs tighten access controls automatically. A database connection that shouldn't exist gets killed.

This isn't a static ruleset you configure once and hope covers everything. It's a system that develops behavioural intuition from running in your environment, responding to your traffic. The difference matters — static rules are brittle against novel attacks, while adaptive systems can catch anomalies they've never seen before.

The baseline isn't magic. It's watching five things:

  • Rate — how many events per time window. A user who averages 50 API calls per hour suddenly making 500 is a signal.
  • Composition — what's in those events. The same user always hitting /api/users and /api/orders suddenly hammering /api/admin/export.
  • Cardinality — how many unique values. One IP hitting 3 endpoints is normal. One IP cycling through 200 endpoints in an hour isn't.
  • Latency — how fast things happen. Legitimate users pause, think, navigate. Bots don't.
  • Novelty — things the system has never seen. A new endpoint, a new parameter, a user agent string that doesn't match anything in the training window.

Three layers of detection stack on top of each other. Layer one is simple thresholds — hard caps that trigger immediately. Layer two is statistical deviation — standard deviations from the learned baseline. Layer three is correlation — looking across multiple signals simultaneously. A spike in rate alone might be fine. A spike in rate plus unusual composition plus new source IP? That's a pattern.

Learning to recognise yourself

A pure anomaly detector would go nuts during deploys. New code paths, changed response times, config reloads — all of it looks unusual. Same with cron jobs. Your 3am batch job that hits the database hard every night would trigger alerts every night.

Tolerance patterns solve this. The system learns to recognise you.

Mark a deploy event, and the system creates a tolerance window — elevated thresholds for the next 30 minutes. Register a recurring cron job, and the system expects that exact spike at that exact time. These aren't exceptions you configure manually. They're patterns the system learns from watching.

After a few weeks, it knows when your weekly cache warm-up runs, when your daily reports generate, when deploys happen. It stops bothering you about the things you do on purpose.

The system gets cheaper over time

Calling an LLM for every anomaly would be expensive. The trick is building immune memory.

When the LLM analyses an anomaly and decides it's benign — say, a deploy spike or a legitimate traffic surge — that verdict gets stored. Next time the same pattern appears, the system recognises it. No LLM call needed.

This is how your security bill drops over the first few weeks. Early on, everything is novel. The LLM gets called constantly. A month in, most anomalies match patterns it's already seen. The LLM only gets called for genuinely new situations.

The more your system runs, the smarter it gets and the less it costs.

Setup without a PhD

The hardest part of any security tool is configuration. Getting thresholds right. Understanding your traffic patterns before you can tell the tool what's normal.

darkforest init flips this. Point it at a log sample — a day's worth of traffic, a week if you've got it — and Claude reads it. Not just parsing, actually understanding the shape of your system. It figures out what your endpoints are, what normal request rates look like, what user agents show up, where your traffic comes from geographically.

Then it writes your config file for you.

You review it, tweak anything that looks wrong, and you're running. No spreadsheets. No guesswork about what "normal" means for your specific stack. The LLM that's going to watch your logs already understands them.

This has to be open

Glasswing is cool. Open-source frameworks like CAI are making progress — but mostly on the offensive side, using LLMs for penetration testing and vulnerability research. On the defensive side, the tooling barely exists. There's no open-source equivalent for the kind of adaptive monitoring and response I'm describing here.

The building blocks are around. Centralised logging is a solved problem. Open standards for security event formats are maturing. Smaller open models are more than capable of pattern analysis on local infrastructure. What's missing is the glue — a framework that takes logs in, builds a baseline, detects anomalies, and can actually respond. Something a small team can deploy without a six-figure security budget.

The threats don't discriminate by company size. The defences shouldn't either. This can't be proprietary or locked behind enterprise contracts.

The dark forest doesn't care how big your company is. The bots scanning your infrastructure don't check your headcount before they attack. If the threats are going to be this accessible, the defences need to be too.

I'm building this. An open-source security agent — adaptive, autonomous, acts when something breaks the pattern. Small enough for a startup to run on their own infrastructure. Centralised logging, open LLMs, scoped response actions. The pieces are all there. I'm wiring them together now.

For v0.1, one real action working end-to-end: detect anomalous authentication patterns, call the LLM for analysis, and disable the compromised account via your identity provider's API. Not just alerting — actually responding while you're asleep. That's the proof of concept that matches the headline.

I'm actively working on this and looking for early testers. If you want alpha access when it's ready, or just want to follow along, drop your email below. I'll reach out when there's something to try.