Conductor: running multiple AI coding agents in parallel

Most development teams have now integrated AI assistants into their workflows in some form. What's becoming interesting is the next step: running multiple agents in parallel rather than one at a time.

The concept is straightforward: instead of one agent working on one task, you have several agents working on separate tasks simultaneously, each in its own isolated environment. There are trade-offs: higher token costs, more to review, tooling that's still maturing, but none of these feels insurmountable. Given how quickly this space is evolving, I'd expect parallel-agent workflows to become fairly common very soon.

Understanding the bottleneck with synchronous workflows

Most AI coding workflows today are synchronous. Whether you're using Cursor, Copilot, or Claude Code, the pattern is the same: one task at a time, one conversation, wait for it to finish, then move on. Meanwhile, five bugs sit in your backlog, each independent, each waiting its turn.

AI coding tools have moved well beyond autocomplete. Today's agents can reason about entire codebases, write tests, and run debugging workflows autonomously. The constraint in the workflow is the single-threaded interaction model, not the AI itself.

Working with multiple agents

This is where multi-agent tools come in. Instead of working with one AI assistant, you're managing a team of them.

The way these tools work is fairly straightforward. They spawn multiple coding instances, each in an isolated git worktree, essentially a copy of your codebase that shares version history but allows independent changes. So one agent can be fixing a date formatting bug while another works on authentication, and a third prototypes a new feature. All of this happens simultaneously, with no risk of them stepping on each other's code.

The mental shift takes some adjustment. When you're working with a single AI assistant, you're in a familiar flow state, focused on one or two tasks, thinking about implementation details. With parallel agents, that changes. The focus shifts to breaking work down effectively and communicating it clearly.

If I'm honest, the first time I tried this, I felt like I was trying to cook a roast dinner, take a phone call, and place an online order all at once. Chaotic and not particularly productive.

But that's the point. If you could do all those things at once, and do them well, you'd get a lot done. Once the shift clicks, it feels less like pair programming and more like delegation. You set direction, check outputs, and keep things moving. You think differently about timing and dependencies, what can start now, what needs to wait, and what can run alongside something else.

The skill isn't coding faster. It's knowing what can happen simultaneously.

The current landscape

Several tools are competing to define this space. I've spent time with Conductor from Melty Labs, though there are alternatives.

Conductor is pretty polished. It's a Mac app from a YC S24 team that previously built Melty, an open-source AI code editor. Their origin story is relatable: "At one point we tried cloning our repo into three directories and running Claude in each of them, but it felt like driving a Subaru with a jet engine strapped on."

The UI is clean and relatively straightforward. You point it at a GitHub repository, create workspaces, and watch the agents work. Notable features include:

Checkpoints (automatic snapshots that let you roll back)
Spotlight testing (sync changes back to your main repo for testing)
Multi-model mode (run Claude and Codex on the same prompt in different tabs to compare approaches).

It's free; you pay your underlying API costs.

Alternatives exist for different needs. Claude Squad offers an open-source, terminal-native approach using tmux and git worktrees. Crystal provides similar functionality under an MIT licence, useful for teams that need full source transparency for proprietary codebases. Anthropic's own Claude Code Agent Teams is an experimental feature signalling where the vendor ecosystem is heading. The space is moving fast, but Conductor currently offers the most polished experience for teams ready to experiment.

The trust issue

I need to address a recurring issue in my research: security concerns.

Tools like Conductor are closed-source binaries that clone your entire repo, spawn multiple AI instances with read/write access, manage your git worktrees, and authenticate to your GitHub account. For developers on proprietary codebases, that's a lot of trust to extend.

The concern became a point of discussion on Hacker News. One commenter didn't mince words: "Full read-write access required to all your GitHub account's repos. Not just code. Settings, deploy keys. The works... You are INSANE to authorise this app on anything other than throwaway code."

To their credit, the Conductor team responded quickly, shipping fine-grained GitHub App permissions and local git/gh auth alternatives within days. Their docs now explicitly address data practices: chat history stays local; they don't access conversation content, though they do collect analytics via PostHog for workspace creation, model selection, and errors.

Whether you're comfortable with that level of data collection depends on your security posture. For teams requiring full transparency, Crystal provides equivalent functionality with complete source visibility.

What this actually looks like

I wanted to test these claims myself, so I created a React Native app with four intentional bugs scattered across different files:

A state management bug in the main component
A UI logic error in a child component
A broken calculation in a stats display
A date formatting issue in a utility function

With a traditional AI assistant, I'd fix them one at a time. Context switch, wait, review, repeat. Four cycles.

With Conductor, in 10 minutes (max), I created 4 workspaces, assigned each agent a bug, watched them work simultaneously, reviewed 4 diffs, and merged. The bugs weren't complex, but the workflow shift was genuinely eye-opening.

It reminded me of the first time I set up a CI pipeline to automatically build and deploy a mobile app. The individual steps don't run faster; it's the orchestration that changes everything.

Are we understating exploration?

One underappreciated benefit of this approach is the opportunity for exploration. When you want to try two different approaches to a refactor, spin up two agents. Let them race. Review both diffs. Pick the winner, or merge the best ideas from each. Conductor's multi-model mode takes this further by running multiple models on the same prompt and comparing their outputs.

The overall transition to working this way may come more naturally than expected. Most of us already context-switch throughout the day, so managing multiple agent workstreams isn't a completely foreign way of working. That said, for those who do their best work in deep focus and aren't used to jumping between tasks, the adjustment may be harder. And for junior developers still building mental models of a codebase, the overhead of orchestration may outweigh the benefits initially.

The catches (there are always some)

I've been positive so far, but there are some downsides. These tools are young, and parallel workflows come with real costs that the marketing doesn't mention.

Setup friction is real. My first Conductor session involved ensuring I was using only a token scoped to a specific repository. Git worktrees don't include untracked files like your .env or node_modules, so every new workspace needs to be bootstrapped.

Token costs multiply. Conductor doesn't charge for usage; you pay Anthropic or OpenAI directly through your existing API keys. But running four agents simultaneously means four times the token consumption. It's easy to spin up workspaces without thinking too hard about what that means for your bill at the end of the month. Worth keeping an eye on, particularly during early experimentation when you're still finding the right workflow.

The human bottleneck doesn't disappear. You're still the quality gate. Now you're just reviewing more things. Four parallel agents potentially mean four times as many bugs to catch, and AI-generated code tends to have more subtle issues than human-written code. Without some guardrails, the architecture and quality can quickly get out of hand and be time-consuming to pull back.

Context doesn't persist between sessions. When you start a new workspace or conversation, the agent has no memory of previous work, your coding conventions, past decisions, or codebase quirks. You can mitigate this with CLAUDE.md files or project documentation that the agent can reference, but it's an overhead worth factoring in. Unlike a human teammate who absorbs context over time, you're always onboarding a (very) capable but amnesiac collaborator.

The leadership angle

For engineering managers, tech leads, and CTOs, this shift has implications beyond individual productivity.

The direction of travel seems clear. Single-agent workflows may soon feel limiting as these tools mature and multi-agent patterns become better understood. Conductor is an early mover, but this space is evolving quickly.

One thing worth considering is the impact on team dynamics. Most teams have a senior engineer who becomes a bottleneck for reviews, not through any fault of their own, but because they're the person everyone needs input from. In theory, parallel agents could handle initial passes or triage straightforward issues, freeing that person to focus on architectural decisions. In practice, you might just end up with more code landing in the review queue faster. Whether this helps or creates new bottlenecks depends on how well the team adapts its review process to match the increased throughput.

The future of AI-assisted development probably isn't about waiting for models to get smarter. It's about learning to work differently, understanding which tasks can run in parallel, breaking problems down effectively, and knowing when the coordination overhead isn't worth it.

If there's one takeaway, it's this: the goal isn't to code faster. It's time to think differently about what can happen at the same time.

Conductor: running multiple AI coding agents in parallel

Emma Williams

What will the state of AI be like by this time next year?

Things we do in our first weeks as Fractional CTO

Stop Coding, Start Leading: Shifting dynamics for startup CEOs

Understanding the bottleneck with synchronous workflows

Working with multiple agents

The current landscape

The trust issue

What this actually looks like

Are we understating exploration?

The catches (there are always some)

The leadership angle

Follow the SaaS Show with Andreas & Sjimi

Member discussion

I'm using my engineering colleagues as my personal agents

Onboard the AI like you'd onboard a developer

QA is the last bottleneck

The AI Agile Manifesto

Bots and Boundaries: Who do you blame when the bot defames? (Part 2)

Conductor: running multiple AI coding agents in parallel

Emma Williams

What will the state of AI be like by this time next year?

Things we do in our first weeks as Fractional CTO

Stop Coding, Start Leading: Shifting dynamics for startup CEOs

Get all the latest posts delivered straight to your inbox.

Understanding the bottleneck with synchronous workflows

Working with multiple agents

The current landscape

The trust issue

What this actually looks like

Are we understating exploration?

The catches (there are always some)

The leadership angle

Follow the SaaS Show with Andreas & Sjimi

Member discussion

I'm using my engineering colleagues as my personal agents

Onboard the AI like you'd onboard a developer

QA is the last bottleneck

The AI Agile Manifesto

Bots and Boundaries: Who do you blame when the bot defames? (Part 2)