The Craft of AI

Multi-Player Agents

The compounding complexity of taking an agent from a single user to a whole team — and beyond.

By Luke Lin · June 3, 2026 · 8 min read

We've been building a multi-player coding agent called ModaStack over the last month. The vision is an autonomous engineering team infused with our product and engineering sense, that gets dispatched from Slack or Linear, and ships products like Bao Hua without a person babysitting each step.

It was easy building this for myself. I've been adding skills on top of gstack and my own coding harness in Claude to optimize my own setup. Zach, my cofounder, has a similar stack used differently. The local harnesses have learned our own tastes.

Now, we want to dispatch a shared Moda Agent from Slack, that shares the same context and skill library, but can still cater to our tastes. But what happens when I have a personal version of /plan-pm-review that Zach doesn't like? Or if a bad coding principle sneaks in and gets committed to memory, like a virus waiting to spread?

We also have to answer questions about whether this agent gets called by us only, or if we can schedule tasks on a cron, or have it triggered off alerts or new tickets.

What started as a fun single-player agent optimization adventure has ballooned to a complex negotiation across context, permissions and more.

This is the compounding complexity of multi-player agents. As your agent impact scales across your team, the complexity to manage it increases, and it compounds even more as you make your agents more autonomous.

The compounding complexity of multi-player agents: complexity rises as an agent moves from single-player to multi-player to autonomous multi-player to orchestrator, and compounds further with each step toward autonomy.

Building on the harness

Tomasz Tunguz recently described software's new center of gravity as the harness, the seven components that turn a raw model into a reliable agent: context and memory, tools and action, orchestration and loop, state and persistence, sandbox and compute, observability and governance, and cost and workflow (Software After AI). His harness assumes one user.

Point that harness at a team and the same components take on compounding complexity. Five pillars carry most of it:

Identity and access. Who the agent acts as, and what each person is allowed to make it do. One shared service account versus an agent that inherits each user's own permissions.
Memory and context. This starts with logging multi-scope memory, tagging each write with user_id, agent_id, session_id, org_id/app_id, and then design decisions about which memories get committed for an individual versus the whole team.
Guardrails and human judgment. The limits on what the agent can do alone versus what needs a person to sign off, and which are enforced with agentic versus deterministic gates.
Orchestration and skills. How the work gets bounded and divided, where skill files and subagent identities draw the lines for what each agent specializes in.
Observability and cost. How you see what the agent actually did, through traces and evals, and how you keep the bill from running away as team use grows.

Tier 1: the single-player agent

Tier 1: Single Player — an agent bound to one person, acting with that person's own access.

What it is. An agent bound to one person, acting with that person's access. It's how most people already use AI at work via Claude, Gemini, and ChatGPT. In plain terms, a personal assistant that holds your keys and runs your errands.

A sample flow. You connect Claude Code or ChatGPT to your own repo, email, and calendar through MCP or a CLI. You ask it to do something. It acts as you, with your access, while you watch and keep what's good.

Why do this? Raw speed for one person. You shape the agent entirely around your own taste, with nobody else to account for.

Setup and management. You wire it to your accounts and write your own context and skills. There's no one else to coordinate, so upkeep is whatever keeps your own setup current.

Where the pillars get harder. They mostly don't, which is the point.

Identity and access: just you, your own credentials, nothing to negotiate.
Memory and context: private to you, and your harness handles most of it.
Guardrails: your own judgment, since you see every step.
Orchestration and skills: your harness handles the loop, so the real work is drawing your own agent boundaries and skills.
Observability and cost: you watch it directly, and the bill is your own.

Tier 2: the multi-player agent

Tier 2: Multi-Player — one agent the whole team reaches from a shared channel, usually running on a service account.

What it is. One agent the whole team reaches from a shared channel, usually running on a service account. This is the largest single step on the ladder, because every pillar switches on at once. Plainly, a shared team inbox that does real work.

A sample flow. A teammate asks something in Slack. The agent loads the team's shared context and memory, does the work, and opens a pull request. A human reviews, merges, and the decision gets written back to memory for next time.

Why do this? This helps your entire team uplevel in a standardized way. Your team might have a few AI-pilled rockstars with decked out harnesses, and the rest using raw Claude Code and generating slop. This helps the team work better together.

Setup and management. Provision the shared identity, write the team AGENTS.md, stand up shared memory, set the guardrails, and name an owner. Day to day, people use the agent to get their work done more easily, and the agent owner monitors performance to prevent regressions and cost issues.

Where the pillars get harder.

Identity and access: this is most likely a service account, but it gets difficult when someone needs the agent to bring in sensitive data that only they can access.
Memory and context: careful design is required to manage memories at an individual versus team level and to manage context bloat for a shared agent.
Guardrails: this is a good practice with single player and a must have with multi-player. The team has to agree on what the agent can and can't do, and where human-in-the-loop is required.
Orchestration and skills: a solid harness can manage the subagent abstraction, but the team has to align on the skills and agent definitions. Skills here should have a peer review process with accountability baked in.
Observability and cost: you need an agent owner who monitors traces and evals to keep the agent performing as expected, especially with multiple people making structural changes over time.

Tier 3: the autonomous multi-player agent

Tier 3: Autonomous Multi-Player — a shared agent that starts itself, on a schedule or a trigger, often with nobody watching.

What it is. A shared agent that starts itself, on a schedule or a trigger, often with nobody watching. Day to day, a night-shift worker who begins tasks when an alarm rings with no manager on site.

A sample flow. A 3am alert fires. The agent pulls logs inside fixed scopes, attempts a bounded fix, and pages a human only when it isn't sure, leaving a full trace behind.

Why do this? Obvious operational gains abound when you have a team of agents that can respond to issues while you're away from the computer, or run scheduled jobs that provide a consistent operational rhythm, like daily sales reports and weekly user cohort reports.

Setup and management. Define the triggers or schedule, the scopes, and the escalation paths. Decide where a human still has to sign off, and add durable checkpoints so a failed run recovers instead of restarting. Because no one watches in real time, this is the tier worth simulating against fake scenarios before it goes live.

Where the pillars get harder.

Identity and access: because the agent does work owned by individuals, it needs to understand the identities of the work's owners. When an alert goes off at 3am and the agent needs to escalate, who does it page?
Memory and context: without someone driving the agent, the system has to give it enough context to do its job without human help, which means proper MCP and CLI integrations scoped per job.
Guardrails: these become monumentally more important and can't live in things like skill files, which agents sometimes ignore. They have to be hard, enforceable gates given the automated nature of the workflow.
Orchestration and skills: clear logic on retries and human-in-the-loop escalation is required here. Skill trees get more complicated as you factor in automated troubleshooting scenarios.
Observability and cost: traces become the main way people understand what the agent has done, and unwatched loops need a hard ceiling on spend.

Tier 4: the orchestrator

Tier 4: Orchestrator — an agent that coordinates other agents and real people toward a goal instead of doing the work itself.

What it is. An agent that coordinates other agents and real people toward a goal instead of doing the work itself. Picture a project manager running a launch across staff and tools at once. Currently, it's still more research than product.

A sample flow. Someone gives it a target. The agent breaks the work into a graph of tasks, hands pieces to sub-agents and humans, tracks progress, re-plans when something stalls, and escalates the calls it shouldn't make alone.

Why do this? This turns agents from executioners into planners and strategists that can help teams run an entire project or sub-function.

Setup and management. This one assumes everything below it already works, since it delegates to reliable lower-tier agents. You define how goals get decomposed, the pool of workers it can assign to, and the oversight surface the lead actually watches. Most of the management is human oversight of the plan, not of any single step.

Where the pillars get harder. Every pillar now stretches across many actors.

Identity and access: a chain of delegations, each worker getting the least access its slice needs, and managing which agents and people can reach which things.
Memory and context: work and context are shared across agents and people, which demands a sophisticated system that handles the “invisible context” behind human actions.
Guardrails: they govern a whole plan rather than a single step, with different guardrails for agents versus humans.
Orchestration and skills: the goal can branch into countless permutations of skills, agents, and human combinations to deliver quality.
Observability and cost: every piece of observability from the earlier tiers, amplified for every member of the team, plus observability for the orchestrator and its reasoning. Cost controls are a must to prevent expensive recursive loops.

Where does that put us now?

The market is off to the races with tier 1 single player agents and starting to design enterprise agentic workflows, which introduces tiers 2 and 3.

In some cases, the line between when to use a single player, fully customized agent versus the multi-player communal agent is a blurry one. For the most AI pilled of us, why would we want to use a multi-player agent and risk losing our customizations?

The tier 3 autonomous agent is where the obvious value lives, but as we learned here, it requires significant scaffolding to be effective and governed enough to be safely deployed. And it will need an agent steward to keep it healthy, just like key datasets need a data steward.

The tier 4 orchestrator feels like a faraway dream, but given the pace of AI development, I wouldn't be surprised if we started seeing early iterations in 2027.

The teams that win this won't be the ones with the smartest agent so much as the ones who know how to design elegant systems around complexity at each step.

We started ModaStack as a fun single-player project and are now negotiating context, permissions, and trust across a team. That's the tax of going multi-player, and it compounds with every step toward autonomy. It's the kind of agentic infrastructure we build at Moda Labs.

Luke Lin

Co-founder & CEO, Moda Labs

Originally published on The Craft of AI.

All posts

KEEP READING

July 8, 2026 · 7 min read

A new way to work: autonomous agent teams

Friday of my first week running the GTM team, the weekly brief had merged context and learnings from every GTM activity that week — without me managing a single file. How I graduated from manually steering one Claude agent to running a coordinated, autonomous team of 11 from Slack.

June 17, 2026 · 6 min read

Harness engineering when reasoning is exponential

Same model, same task, two very different Mondays. A harness is the scaffolding around a model that turns raw reasoning into work that reliably gets done. Six principles for building one that delivers today and grows with the models of tomorrow — when reasoning 10x’s again.

June 10, 2026 · 6 min read

The Return of the Eval

A year ago the loudest voices called evals dead. At Arize:Observe, 700+ builders proved otherwise. The saga from “evals are everything” to “evals are a scam” to the position that settled: evals are the telemetry of agent performance — and three reasons you need them now.