Citations & Sources

Every Thallus response includes citations that trace each claim back to its source. This page explains how citations work, what types exist, and how they're ranked.

What citations look like

After Thallus responds, you'll see a sources panel below the answer. Each citation shows a numbered badge, source type icon, title, and relevance score.

📖 Sources (3)

📄

Q4 Sales Report 2025.pdf

Page 14

92%

🗂

Sales by product category

12 rows · Production DB

88%

🌐

Industry Benchmarks 2025

marketresearch.com

64%

Clicking a citation expands it to show the source text, SQL query, or other details depending on the citation type.

Citation types

Thallus can cite five types of sources:

📄 Document

🗂 Database query

🌐 Web

📧 Email

🤖 Agent

Type	Icon	What's included
Document	📄	Chunk text from your uploaded files, page number, document ID
Database query	🗂	SQL query executed, column names, result data, row count, connection name
Web	🌐	URL of the source page
Email	📧	Email thread ID
Agent	🤖	Results from a prior agent in the same execution

Relevance scoring

Each citation carries a relevance score from 0.0 to 1.0, set by the agent that produced it and refined during evaluation:

Score	Meaning
0.9–1.0	Directly supports a key claim in the response
0.7–0.9	Relevant context or supporting detail
0.3–0.6	Tangentially related
Below threshold	Not relevant (filtered out)

Agents assign initial scores based on how directly the source material answers the query. The evaluator may re-score citations during the evaluation phase if it discovers inconsistencies between data sources and document claims.

How citations are ranked

The final citation list you see is produced by:

Collection — All citations from every agent in the execution are gathered
Deduplication — Citations pointing to the same source are merged. When duplicates are found, the highest relevance score is kept and the longest chunk text is preserved
Sorting — Citations are sorted by relevance score, highest first
Limiting — Citations are capped to keep the response focused and relevant

Deduplication identifies matching citations using source-specific identifiers, so the same source isn't listed twice.

Two-stage search and citations

When agents search your documents, citations come from a two-stage retrieval process:

Synopsis discovery — Document synopses (structured summaries) are searched to identify the most relevant documents using combined ranking across query variants.
Chunk retrieval — Within the matched documents, individual text chunks are retrieved using a similarity threshold optimized for each query. Only these matched chunks become citations.

This means citations point to specific, relevant passages within your documents rather than entire files. The page number, chunk text, and document title are all captured for traceability.

Tracing a citation to its source

Each citation type provides enough information to locate the original source:

Document — document_id + page_number identify the exact location. Click to view the chunk text.
Database query — The full SQL query is stored, along with the connection name and result data. You can see exactly what was queried and what it returned.
Web — The URL links directly to the source page.
Email — The thread_id identifies the email conversation.
Agent — References the agent name and tool used, linking back to a prior step in the execution plan.

For more on how citations flow through the orchestration pipeline, see How Orchestration Works. For how different chat modes affect citation depth, Ask mode typically produces fewer citations (single agent) while Research and Investigate modes produce richer citation sets from multiple sources.