How Orchestration Works
When you send a message, Thallus doesn't just call a single AI model. It runs an orchestration pipeline that plans, executes, evaluates, and synthesizes a response using specialized agents. This page explains how that pipeline works.
The orchestration pipeline
Every query flows through the same high-level pipeline, though the specifics vary by chat mode:
- Query — Your message arrives at the API
- Mode Detection — If set to Auto, an LLM classifies the query to choose Ask, Research, or Investigate
- Planning — A planner creates an execution plan with steps and dependencies
- Execution — Agents run the plan steps, potentially in parallel
- Evaluation — Results are checked for completeness; replanning may occur
- Synthesis — A final response is composed from all agent results and streamed to you
Planning
The planner is an LLM that receives your query along with rich context about your environment:
- Available agents — Which agents are registered and accessible to your organization
- Board context — Table schemas, document catalog, cross-source links, and join maps from your conversation's board
- Memory context — Your personal memories (preferences, facts, patterns)
- Conversation history — Recent messages for follow-up context
- Prior execution — Results from earlier queries in the same conversation
From this, the planner produces an execution plan — a structured execution plan with dependency tracking.
Steps with no dependencies on each other can run simultaneously. Steps that depend on earlier results wait until those complete.
Execution
The executor walks the plan graph, running all steps whose dependencies have been met. Independent steps run in parallel using concurrent execution, so a plan with two independent searches completes in the time of the slower one, not the sum of both.
Each step dispatches to the assigned agent for execution. The agent receives the step's task description plus context including:
- Results from previously completed steps
- Board state (schemas, datasets, findings)
- Document catalog for search agents
- Data previews for query agents
Agents work through their assigned task — calling tools, processing results, and iterating until they have a complete answer.
Evaluation and replanning
After each batch of steps completes, an evaluator examines the results and makes one of three decisions:
- Complete — The success criteria are met with actual answers. Move to synthesis.
- Replan — Results are incomplete or a query returned no data. The evaluator identifies the specific issue (e.g., wrong date filter, missing data source) and new steps are appended to the plan. Multiple replans are allowed before the system synthesizes whatever it has.
- Ask user — The system needs clarification to proceed. It presents you with a question — multiple choice, yes/no, or free text — explaining what it found and why it needs your input.
The Ask path
Ask mode sends your query directly to a single agent for a fast response. If the answer needs more depth, it automatically escalates to the full Research pipeline.
The Research path
Research mode runs the complete pipeline:
- Context loading — Your data environment is prepared for planning
- Planning — The planner creates an execution plan based on full context and memory
- Execution loop — Ready steps run in parallel batches
- Evaluation — After each batch, the evaluator decides to complete, replan, or ask the user
- Synthesis — All agent results and citations are collected and streamed as a final response
- Snapshot — Execution state is saved for follow-up queries in the same conversation
The Investigate path
Investigate mode uses a reactive approach — exploring your question from multiple angles, updating its understanding as new evidence comes in, and checking in with you at key decision points. You can choose to continue, focus on a specific angle, or summarize what's been found so far.
This makes Investigate mode better for open-ended questions where you don't know in advance what steps are needed.
Why plans differ
Two identical questions can produce different plans. This is by design — the planner considers:
- Board state — Which data sources and documents are connected affects what steps are possible
- Memory — Your stored preferences and facts shape how the planner approaches your question (see Memory System)
- Conversation history — Follow-up questions incorporate prior results and context
- Prior execution — In the same conversation, Thallus knows what datasets were already produced and what was already synthesized
This context-awareness is what makes Thallus's responses tailored to your specific environment rather than generic.
For details on what you see while the pipeline is running, see Progress Tracking. For how sources are tracked through the pipeline, see Citations & Sources.