Everyone Has a Decent Agent Orchestration Setup, Right?
If you are a developer, you probably have your own way of running LLMs by now. I do too. In the past, when something felt frustrating, I searched for a tool. These days, the thought keeps changing: "Could I just build this myself?"
I think that shift is possible because the models have gotten good enough.

Image: NASA black hole accretion disk visualization. Source: NASA Science - Black Hole Week 2023
Introduction
When you spend enough time doing agentic coding, or vibe coding if you prefer the lighter phrase, you end up with your own style. Some people keep Claude Code attached to a terminal. Some work mostly inside Cursor or Copilot. Some split work across ChatGPT, Codex, and Claude by role. The prompts differ, the way context is fed differs, and the amount of trust given to the model differs.
I was the same. At first, one agent was enough. Give it work, inspect the result, correct it, continue. But as projects grew, the limits started to show. Backend, frontend, docs, deployment, data migrations, and tests began to tangle together. One agent could not hold every relevant piece of context forever, and manually routing everything as a human became too much work.
Eventually the thought became simple.
I needed orchestration.
Orchestration Sounds Bigger Than What I Wanted
I do not mean orchestration as a grand distributed-systems project. What I wanted was much smaller.
- The backend agent keeps working in the backend worktree.
- The frontend agent keeps working in the frontend worktree.
- The main agent looks at the whole direction and sends work to the right part.
- Each agent keeps its own session and file context.
- A human does not have to copy and paste every message between them.
In other words, I did not want one genius agent. I wanted several ordinary agents, each sitting at its own desk, connected by a thin collaboration layer. There are already good tools. The problem was that none of them fit the way I was already working.
I Tried the Good Tools First
Before building anything, I looked at existing services. Smart people have already built useful systems, and the answer could have been there.
OpenHands
OpenHands, formerly known as OpenDevin, is an open-source platform for software development agents. Its direction is clear from the paper and documentation: agents should be able to act like human developers by writing code, using a command line, browsing the web, and working inside sandboxed environments. It is a big-picture project that includes agent implementations, safe execution, and evaluation benchmarks.
The strength was obvious. It treats the agent's execution environment as part of the product. Instead of manually opening terminals and feeding context, you get a workspace where the agent can operate.
But it was not quite the shape I wanted. I already had my local development environment: repo-specific worktrees, tmux sessions, agents attached to those sessions, and the shell scripts I knew well. Moving into an OpenHands-style integrated environment meant giving up some of that rhythm. That does not make the tool bad. My problem was not "create a new workshop." It was "connect the workshops already running."
Paperclip
I also tried Paperclip. Honestly, it was impressive. It was clearly well made. The interface and flow felt polished, and there were many moments where I thought, "The people who built this really thought it through." Tools like that are hard to build.
In my environment, though, the model and context became the bottleneck. With a weaker model or a local model, complex development context did not stay stable enough. The first few turns would go well, and then the agent would lose the core requirement, reopen decisions that were already settled, or trust its memory more than the actual repository state.
Agent orchestration collapses quickly when the model loses context, no matter how good the UI is. Coding is less about producing a plausible plan and more about holding onto the files that just changed, the test that failed, and the constraints of the existing design. That was the part that did not fit my workflow.
Claude Teamwork
I also tried the Claude Teamwork style of setup: split several Claude sessions by role, with a main session assigning work to sub-sessions. The idea is good. For small problems, it can work well. Role separation means one agent does not have to carry everything, and the work can feel parallel.
The problem was tokens.
The main agent holds the broad context, explains work to each sub-agent, each sub-agent reports back in detail, and the main agent reads that report before deciding the next step. The cost grows quickly. If each agent is an independent session, the same project description, design background, and file context get repeated over and over. Collaboration worked, but every exchange started to look like a long meeting transcript.
Using tokens is not the problem. Good results require context, and context costs tokens. But it felt wasteful to pay large-prompt prices for every wake-up, status check, and routing message between agents. I needed a thinner message layer.
Was I the Problem?
At this point, the obvious question appears.
Was I using the tools wrong? Was my workflow too unusual? Was I just bad at adapting?
Maybe partly. Every developer has a different working style, and what feels natural to me may look strange to someone else. But the same friction kept appearing.
- I already had persistent local agent sessions.
- Each agent already had its own worktree and terminal context.
- What I needed was inter-session messaging, not a huge new IDE.
- Treating every collaboration step as an LLM prompt burned too many tokens.
- Copying messages by hand hit a limit quickly.
So the shape became clearer: I did not need a platform that created and managed agents from scratch. I needed a layer that connected and woke the agents already alive.
So, Should I Burn Some Tokens and Build It?
My environment was already fairly settled. I was working in something close to a one repo, one agent model using worktrees. Each agent could read files, run tests, and keep some continuity inside its own session. The advantage was persistence. The agent did not start in a fresh empty room every time. Its files and terminal history were still on the desk.
So what was missing?
I did not need to create new agents. They already existed. They did not need to join Slack like humans. They just needed a way to send short messages to each other. And when a message arrived, the sleeping session needed to wake up.
That was the first RelayRoom idea.
"Store messages on the server, and keep wake delivery thin."
Do not copy the whole conversation into a giant prompt every time. Send a subject and body to the right part. The receiving agent checks its inbox. If it needs more context, it opens the thread. When the work is done, it closes the thread. Humans can watch the dashboard. Agents can use MCP tools to send and receive messages.
Claude Strikes Back
The timing mattered too. I was using a Claude Max plan, and around mid-June 2026 there was discussion about plan and usage-policy changes. Headless usage such as claude -p, and how automated usage would be counted, became something I had to think about. For an individual developer, tokens and usage limits are not just numbers. They decide whether you can keep experimenting.
The answer is not "use no tokens." If you only try to save tokens while doing AI-assisted coding, quality suffers. But not everything has to be solved with a long conversation on a frontier model. State storage, message routing, read tracking, and wake delivery can be handled by the product layer. The LLM should focus on judgment and coding. The system should handle collaboration wiring.
As the line goes, we will find a way. We always have.
This time, though, I did not go into a black hole. I went into tmux.
tmux
tmux is a terminal multiplexer. It lets you keep multiple sessions, windows, and panes alive inside a terminal. Sessions survive SSH disconnects and can be reattached later. Developers use it for long-running work, multiple servers, and persistent workspaces.
The key for me was tmux send-keys.
tmux send-keys can send keystrokes to a specific tmux pane. An external process can type text into a session and press Enter without a human touching the keyboard. That means an already-open agent session can receive a short nudge like: "A message arrived. Check your inbox."
The core idea is simple.
tmux send-keys -t relayroom:web \
'relayroom inbox --unread' EnterThe real product needs session discovery, duplicate wake prevention, message budgets, idle detection, and security. But the core is that one line. You can wake an existing session. You are not starting a new agent; you are tapping a paused teammate on the shoulder.
The First Test
The first test was primitive. I opened a tmux session and used send-keys from outside to push a sentence into it. It worked. Enter worked. The agent received the input.
That was the moment I thought: this might actually work.
Of course, "input reaches the pane" and "this is a product" are far apart. Where should messages be stored? Which agent should receive them? How do we avoid waking the same session repeatedly for the same message? What should a human see in the dashboard? How should networking work in a self-hosted setup?
But the first link held. The agent was already alive locally, and tmux could wake it. Everything after that was a messaging-system problem.
So I Decided to Build It
That is how RelayRoom started. I was not trying to build a massive agent runtime. In fact, the goal was the opposite. I wanted to connect agents that were already running well on their own.
The product I wanted looked like this:
- An agent sends a message to another part through MCP tools.
- The server stores the message, and the recipient sees it in an inbox.
- When a new message arrives, a pager wakes the tmux session.
- When a conversation is done, the thread is closed so it stops waking anyone.
- A human can see who is online, which threads are open, and where tokens are being spent.
- It should work in self-hosted environments.
I do not know how elegant the result will look. It may be rough in places. But the problem is real because it came from a real workflow.
The painful parts have already started, and more will come. A reverse proxy buffered SSE and swallowed wake notifications. A Host allowlist rejected a valid custom domain with 403. A compose file did not update when the image did. An arm64 Docker build was painfully slow under QEMU. Those stories are the posts that follow this one.
This is being written after launch, so there may still be bugs in the wild. Please be kind. This is not a finished grand theory. It is a record of trying to build an agent collaboration layer that fits the way I actually work.
And I suspect many developers are thinking about something similar.
Everyone has a decent agent orchestration setup, right?
References
- OpenHands, OpenHands: An Open Platform for AI Software Developers as Generalist Agents
- Model Context Protocol, Tools specification
- tmux,
send-keysmanual - NASA Science, Black Hole Week 2023