第4章 编排(Orchestration)— Agent的执行流程
你给Agent一个任务:“帮我调研竞品,写一份分析报告”。
它先做什么?搜索信息?还是先列提纲?搜索到了信息后,直接写报告,还是先整理成表格?写完报告后,直接给你,还是先自查一遍?
这些决策,就是编排要管的事。编排是Agent的骨架——决定了它做事的顺序和逻辑。
4.1 从一个最简单的循环开始#
我画流程时会先问:这一步失败怎么办。
想→做→看#
最简单的Agent执行循环只有三步:
想:分析当前情况,决定下一步做什么。 做:调用工具,执行操作。 看:检查结果,判断是否达到目标。
这就是第2章讲的ReAct循环。每一轮循环,Agent都会重新评估形势,然后决定下一步。
举个例子。你让Agent”帮我查一下特斯拉最新财报的营收”。
第一轮:想→我需要搜索特斯拉财报。做→调用搜索工具。看→找到了财报PDF链接。
第二轮:想→我需要读取这个PDF。做→调用文件读取工具。看→拿到了财报内容。
第三轮:想→我需要从财报中提取营收数据。做→分析文本,找到营收数字。看→任务完成,输出结果。
三轮循环,三步完成。
一个关键设计:步数上限#
循环什么时候结束?正常情况下,Agent判断任务完成后自己停下来。但有一种危险情况:Agent陷入死循环——反复搜索、反复分析、反复修改,永远不交结果。
必须给Agent设步数上限。经验数据表明,10步以内是安全区。超过10步,出错概率和成本都会急剧上升。
怎么实现:在LangGraph中,通过recursion_limit参数设置步数上限:
单Agent的局限#
单Agent模式适合简单任务。但复杂任务会有三个问题:
信息过载。一个Agent既要搜索、又要分析、又要写作,上下文窗口很快就被填满了。就像一个人同时做三件事,每件都做不好。
错误累积。每一步的成功率假设是95%,10步后的总体成功率只有60%。步数越多,累积错误越严重。
能力单一。一个Agent很难同时擅长搜索、分析、写作、编码。就像一个人很难同时是好程序员、好设计师、好产品经理。
所以,复杂任务需要更高级的编排模式。
4.2 Workflow vs 自主Agent:先搞清楚你要哪种#
Anthropic在《Building Effective Agents》中做了一个重要区分:Workflow和Agent是两种不同的东西。很多团队搞混了,结果做出了不必要的复杂系统。
Workflow(工作流)#
Workflow是用预定义的代码路径来编排模型调用。
开发者事先设计好流程:先调模型做A,根据结果调模型做B,再把B的结果传给C。整个流程是固定的、可预测的。
就像一条流水线:每个环节做什么、怎么做,都是提前设计好的。
优点:可控、可预测、容易调试。你知道每一步会发生什么。
缺点:不灵活。如果遇到了设计者没预料到的情况,Workflow不会自动调整。
Agent(自主Agent)#
Agent是让模型自己动态引导自己的过程。
模型自己决定下一步做什么、用什么工具、什么时候停下来。你给它一个目标,它用自己的判断力去完成。
优点:灵活。能处理设计者没预料到的情况。
缺点:不可预测。你不知道它会走哪条路,也不知道它会不会犯错。
怎么选#
Anthropic的建议:先用最简单的方法。
如果一个纯模型就能完成任务,不要加Harness。 如果加一个工具就能搞定,不要加十个。 如果一个固定的Workflow就能满足需求,不要做成Agent。
只有当简单方案确实无法满足需求时,再逐步增加复杂性。
这不是偷懒,是工程智慧。越简单的系统越可靠,越容易维护,越容易调试。
4.3 有向图:把流程变成可执行的代码#
为什么需要有向图#
你设计了一个Agent流程:搜索→分析→写作→审核。这个流程在脑子里是清晰的,但怎么把它变成可执行的代码?
LangGraph的答案是:用有向图。
节点是操作(搜索、分析、写作),边是转移条件(搜索完成→进入分析、分析完成→进入写作)。
这种建模方式有几个好处:
可视化。你可以画出Agent的完整执行流程,一目了然。
可调试。出问题时,你知道是哪个节点出的错,哪条边的条件判断有误。
可组合。复杂的Agent可以由多个简单的子图组合而成。
支持循环。有向图天然支持循环——Agent可以反复执行某个节点,直到满足条件。
条件分支#
Agent不是线性的。它需要根据中间结果做分支决策。
比如:搜索到了有用信息→进入分析。搜索到了无关信息→换关键词重新搜索。搜索失败→请求用户补充信息。
LangGraph用条件边来实现这种分支逻辑。一个节点可以有多条出边,每条边对应一个条件。Agent执行完节点后,根据结果选择走哪条边。
并行执行#
有些操作可以同时进行。
比如:Agent需要搜索三个不同来源的信息。这三次搜索可以并行执行,不需要串行等待。
LangGraph支持并行执行——同一个super-step内的多个节点可以同时运行。这能把Agent的执行时间缩短不少。
4.4 人类在环:什么环节需要人介入#
为什么需要人#
Agent不是万能的。在以下场景中,需要人类介入:
高风险操作。删除文件、发送邮件、修改数据库——这些操作一旦做错就不可逆。在执行前让人确认一下,是必要的安全措施。
模糊决策。Agent遇到了两种可能的解释,不确定用户要哪个。比如用户说”帮我处理一下那个文件”,Agent不确定是哪个文件。
质量检查。Agent写完报告后,让人审核一遍再发布。
介入模式#
审批模式。Agent在关键节点暂停,等人批准后才继续。适合高风险操作。
纠错模式。Agent执行完后展示结果,人可以修改或重做。适合内容创作。
指导模式。Agent在不确定时主动询问,人给出方向。适合探索性任务。
4.5 多Agent协作:什么时候需要多个Agent#
为什么需要多Agent#
单Agent处理复杂任务时,会遇到三个问题:
上下文窗口被塞满。一个Agent既要搜索、又要分析、又要写作、又要审核,所有信息都挤在一个上下文里,很快就超限了。
角色混乱。一个Agent同时扮演搜索专家、分析师、作家、审核员,每个角色都不够专业。
错误放大。一个Agent犯的错误会影响后续所有步骤,没有交叉检查。
多Agent协作解决了这些问题:把任务拆分给多个专业Agent,各司其职,互相配合。
常见的协作模式#
模式一:主管-工人(Supervisor-Worker)
一个Supervisor Agent负责拆解任务和分配工作,多个Worker Agent负责执行具体任务。
这是最常见的模式,适合任务可以清晰拆分的场景。
模式二:流水线(Pipeline)
Agent按顺序依次处理。第一个Agent处理完,把结果传给第二个。像工厂的流水线。
适合任务有明确先后顺序的场景。
模式三:验证-批评(Verifier-Critic)
一个Agent负责生成,另一个Agent负责检查。Generator写代码,Verifier运行测试。Writer写文章,Critic检查事实和逻辑。
这种模式能把输出质量拉上来。
多Agent的成本#
多Agent不是免费的。它有几个代价:
Token消耗增加。每个Agent都需要自己的上下文。多Agent系统的Token消耗通常是单Agent的10-15倍。
协调开销。Agent之间需要传递信息、同步状态、解决冲突。
错误放大。研究发现,独立Agent系统可以将错误放大17倍。
怎么优化:内循环用小模型。多Agent系统中,很多Agent只需要做简单的操作(搜索、整理、格式转换)。这些Agent可以用8B参数的小模型,比如DeepSeek V4-Flash($0.14/百万token),又快又便宜。只有核心决策Agent才需要大模型。
本章小结#
编排是Agent的执行流程。核心要点:
- 单Agent的基本循环:想→做→看。设步数上限,避免死循环。
- Workflow vs Agent:能用Workflow就不用Agent,简单优先。
- 有向图建模:用节点和边定义执行流程,支持循环、分支、并行。
- 人类在环:在关键节点让人介入,而不是每一步都介入。
- 多Agent协作:各有适用场景。多Agent有成本,内循环用小模型优化。
下一章讲Harness的第二个子系统:工具(Tools)。Agent怎么”使用”外部能力。
Chapter 4: Orchestration — Agent Execution Flow#
You give an Agent a task: “Help me research competing products and write an analysis report.”
What does it do first? Search for information? Or first outline the structure? After getting the search results, does it write the report directly, or first organize them into a table? After writing the report, does it give it to you directly, or first review it itself?
These decisions are what orchestration manages. Orchestration is the Agent’s skeleton — it determines the order and logic of how it works.
4.1 Starting with the Simplest Loop#
Think → Act → Observe#
The simplest Agent execution loop has only three steps:
Think: Analyze the current situation, decide what to do next. Act: Call a tool, execute an operation. Observe: Check the result, judge whether the goal is reached.
This is the ReAct loop discussed in Chapter 2. In each round of the loop, the Agent re-evaluates the situation and then decides the next step.
For example, you ask the Agent “help me check Tesla’s latest earnings revenue.”
First round: Think → I need to search for Tesla’s earnings report. Act → Call the search tool. Observe → Found the earnings PDF link.
Second round: Think → I need to read this PDF. Act → Call the file reading tool. Observe → Got the earnings content.
Third round: Think → I need to extract revenue data from the earnings report. Act → Analyze the text, find the revenue numbers. Observe → Task complete, output the result.
Three rounds, three steps, done.
A Key Design: Step Limit#
When does the loop end? Under normal circumstances, the Agent stops itself after judging the task is complete. But there’s a dangerous scenario: the Agent falls into an infinite loop — repeatedly searching, repeatedly analyzing, repeatedly modifying, never delivering results.
You must set a step limit for the Agent. Empirical data shows that within 10 steps is the safe zone. Beyond 10 steps, error probability and cost both increase sharply.
How to implement: In LangGraph, set the step limit via the recursion_limit parameter:
Limitations of Single Agent#
The single Agent pattern suits simple tasks. But complex tasks have three problems:
Information overload. One Agent has to search, analyze, and write — the context window gets filled up quickly. It’s like one person doing three things at once, not doing any of them well.
Error accumulation. Assuming each step has a 95% success rate, the overall success rate after 10 steps is only 60%. The more steps, the worse the cumulative error.
Single capability. It’s hard for one Agent to be good at searching, analyzing, writing, and coding all at once. It’s like it’s hard for one person to be a good programmer, good designer, and good product manager all at once.
So, complex tasks require more advanced orchestration patterns.
4.2 Workflow vs Autonomous Agent: First Figure Out Which You Need#
Anthropic made an important distinction in “Building Effective Agents”: Workflow and Agent are two different things. Many teams mix them up, and end up building unnecessarily complex systems.
Workflow#
A Workflow uses pre-defined code paths to orchestrate model calls.
The developer designs the process in advance: first call the model to do A, then call the model to do B based on the result, then pass B’s result to C. The entire process is fixed and predictable.
It’s like an assembly line: what each step does and how it does it are all pre-designed.
Advantages: Controllable, predictable, easy to debug. You know what will happen at each step.
Disadvantages: Not flexible. If it encounters a situation the designer didn’t anticipate, the Workflow won’t automatically adjust.
Agent (Autonomous Agent)#
An Agent is about letting the model dynamically guide its own process.
The model itself decides what to do next, which tools to use, and when to stop. You give it a goal, and it uses its own judgment to complete it.
Advantages: Flexible. Can handle situations the designer didn’t anticipate.
Disadvantages: Unpredictable. You don’t know which path it will take, or whether it will make mistakes.
How to Choose#
Anthropic’s advice: first use the simplest method.
If a pure model can complete the task, don’t add Harness. If adding one tool can handle it, don’t add ten. If a fixed Workflow can meet the requirements, don’t make it an Agent.
Only when simple solutions truly can’t meet the requirements should you gradually increase complexity.
This isn’t laziness, it’s engineering wisdom. The simpler the system, the more reliable, the easier to maintain, the easier to debug.
4.3 Directed Graph: Turning Flow into Executable Code#
Why Do We Need Directed Graphs#
You designed an Agent flow: search → analyze → write → review. This flow is clear in your mind, but how do you turn it into executable code?
LangGraph’s answer is: use a directed graph.
Nodes are operations (search, analyze, write), edges are transition conditions (search complete → enter analyze, analyze complete → enter write).
This modeling approach has several benefits:
Visualizable. You can draw the Agent’s complete execution flow, clear at a glance.
Debuggable. When problems occur, you know at which node the error occurred, which edge’s condition judgment was wrong.
Composable. Complex Agents can be composed from multiple simple subgraphs.
Supports loops. Directed graphs naturally support loops — the Agent can repeatedly execute a certain node until the condition is met.
Conditional Branching#
Agents aren’t linear. They need to make branching decisions based on intermediate results.
For example: Found useful information → enter analysis. Found irrelevant information → change keywords and re-search. Search failed → ask user for supplementary information.
LangGraph uses conditional edges to implement this kind of branching logic. One node can have multiple outgoing edges, each corresponding to one condition. After the Agent finishes executing the node, it selects which edge to follow based on the result.
Parallel Execution#
Some operations can be performed simultaneously.
For example: the Agent needs to search for information from three different sources. These three searches can be executed in parallel, no need for serial waiting.
LangGraph supports parallel execution — multiple nodes within the same super-step can run simultaneously. This can significantly shorten the Agent’s execution time.
4.4 Human in the Loop: Which Links Need Human Intervention#
Why Do We Need Humans#
Agents aren’t omnipotent. In the following scenarios, human intervention is needed:
High-risk operations. Deleting files, sending emails, modifying databases — once these operations are done wrong, they’re irreversible. Having a person confirm before execution is a necessary safety measure.
Ambiguous decisions. The Agent encountered two possible interpretations, unsure which one the user wants. For example, the user says “help me process that file,” the Agent is unsure which file.
Quality check. After the Agent finishes writing a report, have a person review it before publishing.
Intervention Modes#
Approval mode. The Agent pauses at key nodes, continues only after human approval. Suitable for high-risk operations.
Correction mode. After the Agent finishes executing, it displays the results; the human can modify or redo. Suitable for content creation.
Guidance mode. The Agent proactively asks when uncertain; the human provides direction. Suitable for exploratory tasks.
4.5 Multi-Agent Collaboration: When Do You Need Multiple Agents#
Why Do We Need Multi-Agents#
When a single Agent handles complex tasks, it encounters three problems:
The context window gets stuffed. One Agent has to search, analyze, write, and review — all information is crammed into one context, and it exceeds the limit quickly.
Role confusion. One Agent simultaneously plays the roles of search expert, analyst, writer, and reviewer, and isn’t professional enough at each role.
Error amplification. Errors made by one Agent affect all subsequent steps, with no cross-checking.
Multi-Agent collaboration solves these problems: split the task among multiple specialized Agents, each doing its own job, coordinating with each other.
Common Collaboration Patterns#
Pattern 1: Supervisor-Worker
One Supervisor Agent is responsible for breaking down tasks and assigning work; multiple Worker Agents are responsible for executing specific tasks.
This is the most common pattern, suitable for scenarios where tasks can be clearly divided.
Pattern 2: Pipeline
Agents process in sequence. The first Agent finishes processing, passes the result to the second. Like a factory’s assembly line.
Suitable for scenarios where tasks have a clear sequential order.
Pattern 3: Verifier-Critic
One Agent is responsible for generation; the other Agent is responsible for checking. Generator writes code; Verifier runs tests. Writer writes articles; Critic checks facts and logic.
This pattern can pull up output quality.
The Cost of Multi-Agents#
Multi-Agents aren’t free. They have several costs:
Increased Token consumption. Each Agent needs its own context. The Token consumption of multi-Agent systems is typically 10-15x that of a single Agent.
Coordination overhead. Information needs to be passed between Agents, states synchronized, conflicts resolved.
Error amplification. Research has found that independent Agent systems can amplify errors by a factor of 17.
How to optimize: Use small models for inner loops. In multi-Agent systems, many Agents only need to do simple operations (search, organize, format conversion). These Agents can use small models with 8B parameters, such as DeepSeek V4-Flash ($0.14/million tokens), which is fast and cheap. Only core decision-making Agents need large models.
Chapter Summary#
Orchestration is the Agent’s execution flow. Core points:
- Basic loop of single Agent: Think → Act → Observe. Set a step limit, avoid infinite loops.
- Workflow vs Agent: Use Workflow if you can, not Agent — simplest first.
- Directed graph modeling: Use nodes and edges to define execution flow, supporting loops, branches, and parallelism.
- Human in the loop: Have humans intervene at key nodes, not at every step.
- Multi-Agent collaboration: Each has suitable scenarios. Multi-Agents have costs; optimize by using small models for inner loops.
The next chapter covers the second subsystem of Harness: Tools. How Agents “use” external capabilities.