Uploading Documents
Thallus processes your documents into searchable knowledge that agents can reference when answering questions. Upload PDFs, Word docs, spreadsheets, and text files to build your knowledge base.
Supported file types
| Type | Extensions | Best for |
|---|---|---|
| Reports, policies, contracts | ||
| Word | .docx, .doc | Proposals, guides, handbooks |
| Text | .txt, .md | Notes, documentation, READMEs |
| Spreadsheet | .csv, .xlsx, .xls | Data tables, reports, exports |
How uploading works
When you upload a file, Thallus runs several checks before queueing it for processing:
- Select file — Choose a file from your device. Maximum file size is 50 MB.
- Validate — Thallus checks the file extension and inspects the content bytes to confirm the file type matches (e.g., a
.pdfmust start with valid PDF headers). - Dedup check — A content hash is computed and compared against existing documents in the target collection to prevent duplicates.
- Reserve chunks — The system estimates how many chunks the document will produce and reserves capacity against your plan limits.
- Processing queued — The file is saved and a background task begins extracting, chunking, and embedding the content.
Chunk budget and billing
Documents are split into chunks for search. Before processing begins, Thallus estimates the chunk count based on file size and type. You can see the estimated count before confirming the upload.
Your plan defines three limits that apply to document processing:
- Storage cap — Total chunks stored across all collections
- Monthly processing cap — Chunks processed within a billing period
- Per-document cap — Maximum chunks from a single document (varies by plan)
If any limit would be exceeded, the upload is rejected with a clear error message (storage_limit_reached or processing_limit_reached). You can free up capacity by deleting documents you no longer need.
Duplicate detection
Thallus detects duplicate files within a collection:
- Same file in different collections — Allowed. Each collection maintains its own search index.
- Same file in the same collection — Rejected. The upload returns an error identifying the existing document.
- Re-uploading — Delete the original document first, then upload the new version.
Uploading to organization collections
Admins can upload documents to organization-wide collections that are visible to all members. Organization uploads use a separate admin endpoint and follow the same validation pipeline.
Organization documents count against the organization's chunk budget rather than the uploading user's personal allocation. For more on how collections work, see Document Collections.
What happens next
Once your file is queued, Thallus extracts its content, splits it into chunks, generates embeddings, and creates a structured synopsis. The document becomes searchable as soon as processing completes.
For details on each processing step, see How Documents Are Processed.