Document Search Pipeline
When you ask a question, Thallus searches your documents using multiple signals and stages to find the most relevant passages. This approach is faster and more accurate than searching every chunk in your entire knowledge base.
How search works
Thallus identifies the most relevant documents from your knowledge base, then retrieves the best matching passages from those documents. Both semantic similarity and keyword matching are used at each stage, and results from each method are combined to produce a unified ranking.
The initial document selection uses a loose threshold to avoid missing documents that might contain relevant passages. Passage retrieval applies stricter thresholds, and results are capped per document to prevent any single document from dominating.
Query expansion
Your question is automatically expanded into multiple search variants to improve coverage across different terminology and phrasings:
Variants cover different phrasings and aspects of your question to improve coverage.
How results are ranked
Documents and passages are ranked by relevance to your question using multiple signals. Documents found by multiple query variants and by both search methods (semantic + keyword) rank higher than those found by a single variant or method. Passages are similarly ranked using a combination of relevance signals to surface the best matches.
Why a document might not appear
If a document you expected doesn't show up in results, check these common causes:
- Below discovery threshold — The synopsis doesn't match closely enough
- Outranked — The document matched but didn't rank high enough to advance
- Collection excluded — The collection has Include in research turned off (see Collections)
- Access restricted — The document is in another user's personal collection
- Not yet processed — The document is still pending or processing (see Processing)
Deep research
For complex questions, Thallus runs deeper searches to ensure comprehensive coverage and filters results for relevance.
Related pages
- How Documents Are Processed — The pipeline that creates chunks and synopses
- Citations & Sources — How search results become citations in responses
- Document Collections — Research scope and collection management