Swarm Engineering: Simulating a Company with LLMs

The oddest thing about swarm engineering is how it shifts your role. Instead of writing every line, you break a problem into pieces, spawn specialized agents for those pieces, and review their work. It’s like coordinating a tiny team — except the team members are API calls with different prompts.

And yes, in many cases the infrastructure can be surprisingly cheap.

What swarm engineering means in practice

Swarm engineering is an async, multi-agent pattern:

An orchestrator takes a high-level task
It decomposes the work into smaller subtasks
It spawns specialized subagents with role-specific prompts
Subagents run concurrently and report back
The orchestrator merges results, resolves dependencies, and continues

Think of it as applying the actor model to text-based agents: you get parallelism and separation of concerns without provisioning servers for each role.

Why it can feel like running a tiny company

You can give agents precise roles instead of the generic “write code” prompt. For example:

backend engineer: API design and schema work
frontend specialist: UI patterns and accessibility
devops: CI/CD and deployment scripts
code reviewer: adversarial checks and edge cases
doc writer: user guides and examples
QA tester: test cases and regression checks

Each agent runs with its own prompt and context. When you start a feature, the orchestrator might spawn a few agents to work in parallel. They finish at different times; the orchestrator resolves ordering and handles conflicts when agents touch related files.

That delegation-and-review loop starts to look a lot like lightweight engineering management.

Agents that act beyond the repo

With protocols that let agents call external tools, they can do more than edit files.

For example, an agent could query a Postgres schema, hit a staging API, post results to Slack, open a ticket, or make commits and open PRs.

That capability makes the “simulated company” metaphor less fanciful. You can set up agents that monitor CI, update docs when APIs change, or run nightly checks and file issues.

I haven’t wired up an agent to handle on-call for me, but it’s technically possible: page an agent, let it read logs, suggest (or apply) a fix, deploy, and post a summary.

Costs and feasibility

Two years ago this would have been prohibitively expensive. Now there are free or cheap inference options and that changes the math.

Free and low-cost endpoints and subscriptions mean you can run multiple agents for small, frequent tasks without a huge bill. Your mileage will vary depending on models, token needs, and rate limits.

For a medium refactor I run a planning call and a handful of worker calls; that can be a few hundred thousand tokens. Depending on provider and context length, that might cost a couple of dollars, not dozens.

How much control you have

Where this differs from a simple chat interface is the control over agent behavior. You can:

define agent types with focused system prompts
inject role-specific context (“you’re a Rust engineer, avoid unwrap”)
restrict tool access per agent
require verification steps, like running tests before finishing
configure review workflows and retry policies

I keep a NixOS architect agent described in a short markdown file. It follows my conventions and refuses to write inline configs. Writing that file took minutes and pays off any time I refactor.

When it helps — and when it doesn’t

Swarm engineering isn’t a silver bullet. It works best when tasks can be broken into clear subtasks, when the codebase is reasonably structured, and when you can state acceptance criteria. It struggles with tightly coupled changes, tasks that need a single continuous vision, or when the request itself is vague.

Use it to scale routine work — refactors, tests, migrations, docs. Keep the architectural decisions under your control.

Where this pattern could lead

The orchestrator-plus-specialist pattern applies beyond code. You could build swarms for content workflows, customer support triage, or data analysis pipelines — researcher, writer, editor agents working together.

Tools are maturing faster than our collective imagination for how to use them. The practical constraint now is designing good workflows and clear acceptance criteria.

If this sounds far-fetched, it’s not. My daily setup uses open-source tools, cheap inference endpoints, and a few markdown files describing agents. It makes some hard jobs simpler, and it feels oddly useful.