Document Search Pipeline

When you ask a question, Thallus searches your documents using multiple signals and stages to find the most relevant passages. This approach is faster and more accurate than searching every chunk in your entire knowledge base.

How search works

Thallus identifies the most relevant documents from your knowledge base, then retrieves the best matching passages from those documents. Both semantic similarity and keyword matching are used at each stage, and results from each method are combined to produce a unified ranking.

The initial document selection uses a loose threshold to avoid missing documents that might contain relevant passages. Passage retrieval applies stricter thresholds, and results are capped per document to prevent any single document from dominating.


Query expansion

Your question is automatically expanded into multiple search variants to improve coverage across different terminology and phrasings:

Query expansion example
Your question
"What is our sales quota policy?"
↓ expanded into search variants
sales quota policy quota attainment targets sales compensation plan quota rules by role sales performance metrics

Variants cover different phrasings and aspects of your question to improve coverage.


How results are ranked

Documents and passages are ranked by relevance to your question using multiple signals. Documents found by multiple query variants and by both search methods (semantic + keyword) rank higher than those found by a single variant or method. Passages are similarly ranked using a combination of relevance signals to surface the best matches.


Why a document might not appear

If a document you expected doesn't show up in results, check these common causes:

  • Below discovery threshold — The synopsis doesn't match closely enough
  • Outranked — The document matched but didn't rank high enough to advance
  • Collection excluded — The collection has Include in research turned off (see Collections)
  • Access restricted — The document is in another user's personal collection
  • Not yet processed — The document is still pending or processing (see Processing)

Deep research

For complex questions, Thallus runs deeper searches to ensure comprehensive coverage and filters results for relevance.