Why I Banned the Word "Agent" From My Architecture Reviews

Three months ago I told my team to stop using "agent" in design docs. Here is the four-box taxonomy that made our reviews coherent again — and the three things sold as agents that aren't.

Three months ago, in a quarterly architecture review, an engineer on my team pitched what she called a "customer success agent." After ten minutes I asked her to draw the architecture. It was a SQL query, a single chat completion call, and a Slack webhook. The next person up pitched a "research agent" that turned out to be three sequential LLM calls behind a button. By the third "agent" of the morning, I called it: nobody on this team gets to use the word "agent" in an architecture review again until they can tell me, in one sentence, what their thing does that a workflow can't.

The word had stopped helping us think. It was making everyone build the same thing and call it different things, or build different things and call them all the same thing. Either way, the diagrams stopped explaining the architecture, and architecture reviews are not the place where ambiguity earns its keep.

I've since come to believe this isn't a problem with one team. It's the dominant failure mode of how the industry is currently shopping for, building, and naming AI software. And the catalog that runs this site happens to contain the proof.

A category that contains everything contains nothing

Earlier this week we audited the primary-category tags on every product in the TopReviewed catalog. The category called "AI Agents & Assistants" contained, among other things: Claude (a chat product), ChatGPT (a chat product), Microsoft Copilot (also a chat product), Pydantic AI (a Python library), Rasa (a conversational platform that predates the modern LLM era by half a decade), Relevance AI (a no-code workflow builder), Synthflow (a voice-call platform), and Vapi (a developer API for building voice agents).

The tagging was generated by an AI. The AI was doing its best with the available signal. The signal it had was the same signal everyone else has: marketing copy. And in marketing copy, "agent" is now applied with the rigor of a sticker gun.

I don't fault the model that did the tagging. The category is unfit for its purpose. A useful taxonomy carries information; this one was load-bearing only on vibes. A library, a chatbot, a 2017 conversational platform, and a voice-API SDK do not belong in the same architectural conversation. They have nothing in common except the word.

The honest four-box taxonomy

What helped my team start having useful conversations again was forcing every proposal into one of four boxes before we discussed anything else. The boxes are not novel — most engineering leaders I respect carry some version of this in their head — but writing them on a whiteboard before each review changed how we evaluated tradeoffs.

A chatbot is a product where the human is the loop. Every step requires user input. State lives in a transcript. The model is the product. Examples: ChatGPT, Claude.ai, Microsoft Copilot in its consumer mode. The right question for a chatbot is not "is the model good" but "is the conversation worth my user's time."

A copilot is a product that lives inside another product and offers in-context assistance. The user is in charge of the surrounding work; the copilot supplies suggestions, drafts, or completions inside a defined surface. Cursor is a copilot for code. Grammarly is a copilot for writing. The right question is not "what can it do" but "does it stay out of the way when I don't need it."

A workflow is a deterministic sequence of steps in which one or more steps may use a model. The structure is fixed; the model fills in a slot. Zapier with an OpenAI step is a workflow. n8n with a Claude step is a workflow. So is the "research agent" my engineer pitched: three LLM calls in sequence, each with a fixed prompt template, each writing to a fixed output format. The right question for a workflow is not "is this AI" but "would a human, given infinite patience and the same instructions, produce roughly the same output every time." If yes, you have a workflow. Workflows are excellent. They are also not agents.

An agent is a system that takes a goal rather than a task, decides for itself what intermediate steps to take, uses tools to act on the world, maintains state across those steps, and possesses some mechanism for noticing when it's failing and changing course. The structure is not fixed. Two runs of the same agent against the same goal can take meaningfully different paths. The right question for an agent is not "what does it do" but "what is the policy that decides what it does, and how do I tell when that policy is wrong."

This is not a continuum. The four boxes are categorical. A workflow with a clever LLM step is still a workflow. A chatbot wrapped in a button is still a chatbot. The boxes have edges, and the edges are the most useful thing about them.

Three things sold as agents that aren't

Once the boxes were on the whiteboard, the same three patterns kept showing up in pitches, and getting renamed.

The wrapped chatbot. A system prompt, a model, and a Slack or Teams integration. The pitch is "an agent that handles X for our team." The reality is a chat interface with one fewer click between the user and the model. There is no goal-decomposition, no tool use beyond reading and writing messages, no state beyond the conversation. It can be a useful product. It is not an agent. The architectural cost of building it is one engineer for two weeks; the architectural cost of operating it as if it were an agent — with eval pipelines, observability, rollback plans, drift monitoring — is six engineers for a year. Pricing the work as if it's an agent will sink the project.

The deterministic workflow with one model in the middle. A scheduled job extracts data from a system, sends it to a model with a fixed prompt, parses the structured output, and writes the result somewhere. Nothing about the path through the system depends on what the model returned. If the model returned garbage, the workflow either errors or ships garbage. There is no decision policy, just an execution graph. This pattern is not bad — most production AI is some version of this and most should be — but calling it an agent obscures what's actually load-bearing in the design, which is the parsing logic and the failure handling, neither of which is the AI part.

The frontend integration. A chat UI bolted to a database, a knowledge base, or a SaaS platform. The model is invoked once per user message. It can call retrieval, but the retrieval is one shot per turn and the conversation is the loop. The architecture diagram has the same shape as a 2018 chatbot with better intent classification. The only thing that has changed is the quality of the responses, which is a real change but not an architectural one. Marketing this as an agent invites buyer expectations the system cannot meet, which is how AI products earn their reputations for over-promising.

What to ask in technical evaluation

When my team now reviews an internal AI proposal — or evaluates a vendor pitch claiming to sell us an agent — three questions cut through the marketing layer faster than anything else.

Where does state live, and who owns it? A chatbot's state is the transcript. A workflow's state is the database row the job is processing. An agent's state is something the agent itself manipulates as part of its operation: a scratchpad, a memory store, a graph of partially-completed subgoals. If the answer to "where does state live" is "in the prompt," you have a chatbot. If the answer is "in a row in our pipeline table," you have a workflow. If the answer requires a diagram, you might have an agent. (You also might have a poorly-specified workflow. The diagram alone is not the proof.)

Who decides when it's done? In a chatbot, the user decides — they close the tab. In a workflow, the schema decides — the last step ran, the row was written. In an agent, the system itself has to decide, which means it has to have a notion of "done" that doesn't reduce to "the script finished." Most things sold as agents fail this question. They terminate when their token budget runs out, when a fixed step count is reached, or when the LLM returns a string containing the word "complete." None of those is a definition of done. They are mechanisms for not running forever, which is necessary but not sufficient.

What does failure look like, and how do you detect it? Workflows fail loudly: a step throws, the job retries, eventually a human looks at it. Chatbots fail visibly: the user reads the wrong answer and complains, or doesn't, but at least the user is in the loop. Agents fail in the most expensive way: silently, partially, with confidence. They keep going. They make a sequence of locally-plausible decisions that compound into globally-wrong outcomes. The eval and observability cost of operating real agents is the dominant cost of running them in production, and any vendor or internal proposal that does not foreground this is selling you a chatbot in agent clothing.

The build-vs-buy implication

The four-box taxonomy also clarifies the most common architecture mistake I see, which is buying agent infrastructure for a chatbot problem.

If your real need is a chatbot — a contained surface where users ask things and get answers — you do not need an agent framework, an agent observability platform, or an agent-first vendor. You need a model API, a retrieval layer, and a frontend. The companies in this space charging six figures for "agent infrastructure" know this. They also know that "buy our agent platform" is an easier procurement conversation in 2026 than "buy our chat completion wrapper," and so they do not correct the misclassification at sales time.

If your real need is a workflow, you should build the workflow with whichever orchestration tool your team already uses — n8n, Zapier, Airflow, Temporal, a cron job and a Python script — and add a model call as a step. Workflow problems are nearly always better solved by workflow tools. The fact that LangChain and its descendants have spent three years convincing engineering teams otherwise is one of the more expensive linguistic accidents in recent infrastructure history.

If your real need is genuinely an agent — a system that has to decide for itself what to do, with state, tools, and a recovery loop — then you need to invest in the parts of agent operation that are actually hard, which are evals, observability, and human override paths. Almost no vendor sells you those things; they sell you the orchestration layer, which is the easy part. The unsexy truth is that an agent that works in production is 20% framework and 80% the discipline of measuring it.

What I now ask my team to write instead

The rule on my team is: in any architecture document or review, the system gets named for what it is, not for what it would be most exciting to call it. A chatbot is a chatbot. A workflow is a workflow. A copilot is a copilot. The word "agent" is reserved for systems where we can defend, in writing, why we need a non-deterministic policy and how we plan to measure when the policy is wrong.

It turns out that most of what we wanted to build did not need to be agents, and was easier and cheaper to ship after we admitted that. The two systems we kept calling agents — both involved long-horizon decision-making against external systems we don't control — got more rigorous design work because we stopped letting the word do the work for us.

The strongest architectures in 2026 do not need to call themselves agents to be impressive. The reverse is increasingly true: when a vendor leads with the word, I now treat it as a tell that the technical content is thin, the way "blockchain" was a tell in 2018 and "machine learning" was in 2015. Sometimes the system underneath is genuinely interesting. More often the word is doing all the lifting.

The closing observation

Engineering vocabulary collapses into marketing vocabulary in cycles. Every five years a new word arrives that is precise on day one, useful on day 100, and meaningless on day 1000. We are at day 1000 of "agent." The word will get worse before it gets better, because the gap between what most products called agents actually do and what the word promises is too large to close from either side.

The fix is not to find a better word. The fix is to not let the word drive the architecture. Name your systems for their structure. Reserve the spicy nouns for the cases where they earn their place in the diagram. The team that does this consistently builds clearer systems, ships them faster, and spends less time explaining to executives why the agent that was supposed to handle customer success last quarter is still in pilot. Mostly because it was never an agent. It was a chatbot, and once we called it one, the path to shipping it got noticeably shorter.

Why I Banned the Word "Agent" From My Architecture Reviews

A category that contains everything contains nothing

The honest four-box taxonomy

Three things sold as agents that aren't

What to ask in technical evaluation

The build-vs-buy implication

What I now ask my team to write instead

The closing observation

Discussion

Author

Recent Posts

When the Panel Splits 4 Points: Stripe, Datadog, Figma & Perplexity

The Free-Tier Premium: Why Our Highest-Scored AI Tools Cost the Least

Why Hidden-Pricing Software Hits an 8.15 Ceiling on Our Review Panel

More from the Blog