Glimpses of the Future: Speed & Swarms

Happiness in Fast Tempo, by Walter Quirt

If you experiment with new tools and technologies, every so often you’ll catch a glimpse of the future. Most of the time, tinkering is just that — fiddly, half-working experiments. But occasionally, something clicks, and you can see the shift coming.

In the last two months, I’ve experienced this twice while coding with AI. Over the next year, I expect AI-assisted coding to get much faster and more concurrent.

Speed Changes How You Code

Last month, I embarked on an AI-assisted code safari. I tried different applications (Claude Code, Codex, Cursor, Cline, Amp, etc.) and different models (Opus, GPT-5, Qwen Coder, Kimi K2, etc.), trying to get a better lay of the land. I find it useful to take these macro views occasionally, time-boxing them explicitly, to build a mental model of the domain and to prevent me from getting rabbit-holed by tool selection during project work.

The takeaway from this safari was that we are undervaluing speed.

We talk constantly about model accuracy, their ability to reliably solve significant PRs, and their ability to solve bugs or dig themselves out of holes. Coupled with this conversation is the related discussion about what we do while an agent churns on a task. We sip coffee, catch up on our favorite shows, or make breakfast for our family all while the agent chugs away. Others spin up more agents and attack multiple tasks at once, across a grid of terminal windows. Still others go full async, handing off Github issues to OpenAI’s Codex, which works in the cloud by itself… often for hours.

Using the largest, slowest model is a good idea when tackling a particularly sticky problem or when you’re planning your initial approach, but a good chunk of coding can be handled by smaller, cheaper, faster models.

How much faster? Let’s take the extreme: Qwen 3 Coder 480B runs at 2,000 tokens/second on Cerebras. That’s 30 times faster than Claude 4.5 Sonnet and 45 times faster than Claude Opus 4.1. It Qwen 3 Coder takes 4 seconds to write 1,000 lines of JavaScript; Sonnet needs 2 minutes.

No one is arguing Qwen 3 Coder 480B is a more capable model than Sonnet 4.5 (except maybe Qwen and Cerebras… 🤔). But at this speed, your workflow radically changes. I found myself chunking problems into smaller steps, chatting in near real-time with the model as code just appeared and was tested. There was no time for leaning back or sipping coffee. My hands never left the keyboard.

At 30x speed, you experiment more. When the agent is slow there’s a fear that holds you back from trying random things. You experiment less because having to wait a couple of minutes isn’t worth the risk. But with Qwen 3, I found myself firing away with little hesitation, rolling back failures, and trying again.

After Qwen 3, Claude feels like molasses. I still use it for big chunks of work, where I’m fine letting it churn for a bit, but for scripting and frontend it’s hard to give up Qwen’s (or Kimi K2’s) speed. For tweaking UI –– editing HTML and CSS – speed coupled with a hot-reloader is incredible.

I recommend everyone give Qwen 3 Coder a try, especially the free-tier hosted on Cerebras and harnessed with Cline. If only to see how your behavior adjusts with immediate feedback.

Swarms Speed Up Slow Models (But Thrive with Conventions)

To mitigate slow models, many fire up more terminal windows.

Peter Steinberger recently wrote about his usual setup, which illustrates this well:

I’ve completely moved to codex cli as daily driver. I run between 3-8 in parallel in a 3x3 terminal grid, most of them in the same folder, some experiments go in separate folders. I experimented with worktrees, PRs but always revert back to this setup as it gets stuff done the fastest.

The main challenge with multi-agent coding is handling Git conflicts. Peter relies on atomic commits, while others go further. Chris Van Pelt at Weights & Biases built catnip, which uses containers to manage parallel agents. Tools like claude-flow and claude-swarm use context management tactics like RAG, tool loadout, and context quarantining to orchestrate “teams” of specialist agents.

Reading the previous list, we can see the appeal of Peter’s simple approach: nailing down atomic commit behaviors lets him drop into any project and start working. The swarm framework approach requires setup, which can be worth it for major projects.

However, what I’m excited about is when we can build swarm frameworks for common environments. This reduces swarm setup time to near zero, while yielding significantly more effective agents. It’s the agentic coding equivalent of “convention over configuration”, allowing us to pre-fill context for a swarm of agents.

This pattern — using conventions to standardize how agents collaborate — naturally aligns with frameworks that already prize convention over configuration. Which brings us to Ruby on Rails.

Obie Fernandez recently released a swarm framework for Rails, claude-on-rails. It’s a preconfigured claude-swarm setup, coupled with an MCP server loaded with documentation matching to your project’s dependencies.

It works extraordinarily well.

Like our experiments with the speedy Qwen 3, claude-on-rails changes how you prompt. Since the swarm is preloaded with Rails-specific agents and documentation, you can provide much less detail when prompting. There’s little need to specify implementation details or approaches. It just cracks on, assuming Rails conventions, and delivers an incredibly high batting average.

To handle the dreaded Git conflicts, claude-on-rails takes advantage of the standard Rails directory structure and isolates agents to specific folders.

Here’s a sample of how claude-on-rails defines the roles in its swarm:

architect:
  description: "Rails architect coordinating full-stack development for DspyRunner"
  directory: .
  model: opus
  connections: [models, controllers, views, stimulus, jobs, tests, devops]
  prompt_file: .claude-on-rails/prompts/architect.md
  vibe: true
models:
  description: "ActiveRecord models, migrations, and database optimization specialist"
  directory: ./app/models
  model: sonnet
  allowed_tools: [Read, Edit, Write, Bash, Grep, Glob, LS]
  prompt_file: .claude-on-rails/prompts/models.md
views:
  description: "Rails views, layouts, partials, and asset pipeline specialist"
  directory: ./app/views
  model: sonnet
  connections: [stimulus]
  allowed_tools: [Read, Edit, Write, Bash, Grep, Glob, LS]
  prompt_file: .claude-on-rails/prompts/views.md

The claude-swarm config lets you define each role’s tool loadout, model, available directories, which other roles it can communicate with, and provide a custom prompt. Defining a swarm is a significant amount of work, but the conventions of Rails lets claude-on-rails work effectively out-of-the-box. And since there’s multiple instances of Claude running, you have less time for coffee or cooking.

And installing claude-on-rails is simple. Add it to your Gemfile, run bundle, and set it up with rails generate claude_on_rails:swarm.

In the past I’ve worried that LLM-powered coding agents will lock in certain frameworks and tools. The amount of Python content in each model’s pre-training data and post-training tuning appeared an insurmountable advantage. How could a new web framework compete with React when every coding agent knows the React APIs by heart?

But with significant harnesses, like claude-on-rails, the playing field can get pretty even. I hope we see similar swarm projects for other frameworks, like Django, Next.js, or iOS.

The conversation around AI-assisted coding has focused on accuracy benchmarks. But speed — and what speed enables — will soon take center stage. Being able to chat without waiting or spin up multi-agent swarms will unlock a new era of coding with AI. One with a more natural cadence, where code arrives almost as fast as thought.