PII Detection
Thallus automatically identifies columns that likely contain personal information — emails, phone numbers, social security numbers — and flags them so admins can apply appropriate access controls.
Automatic detection
During schema discovery, Thallus scans column names for patterns that suggest personally identifiable information:
| Category | Example column names |
|---|---|
| email, e_mail, email_address | |
| Phone | phone, mobile, cell_number |
| Identity | ssn, social_security, passport, national_id |
| Financial | credit_card, card_number, bank_account, iban |
| Authentication | password, api_key, auth_token, secret |
| Address | address, street, zip_code, postal_code |
| Personal | birth_date, dob, salary, income |
How detection works
Thallus detects PII from column names and optional format patterns — not by scanning your actual data in bulk.
- Column name matching — Names are checked against known PII patterns across all categories above
- Format detection — When limited sample values are available, patterns like email formats and phone number formats are identified
- No data scanning — This is a lightweight, privacy-safe approach. Your actual data is not read in bulk during PII detection
What happens when PII is detected
When a column is flagged, three things happen:
- Column flagged — A PII indicator appears in the schema view next to the column, showing the detected category
- Sample values hidden — Sample values are never cached or displayed for PII columns, even in admin views
- Admin notification — Admins can review flagged columns and decide what access controls to apply
Sensitivity levels
Tables can be classified at three sensitivity levels to indicate the type of data they contain:
| Level | Meaning | Use for |
|---|---|---|
| Normal | No special handling | General business data |
| Sensitive | Contains PII or personal data | Customer info, employee records |
| Restricted | High-value confidential data | Financial records, credentials |
Sensitivity levels are informational — they help admins make informed decisions about access control but don't automatically restrict access on their own.
Manual overrides
Automatic detection catches common patterns, but admins have full control to adjust:
- Mark columns as PII that weren't auto-detected — useful for columns with non-standard names
- Clear false positives — Remove the PII flag from safe columns (e.g., a column named
email_templatethat stores template text, not email addresses) - Exclude columns — Hide them from agent visibility entirely, regardless of PII status
- Add descriptions — Clarify what a column contains to help agents generate more accurate queries
PII and access control
PII detection informs admin decisions but doesn't automatically restrict access. The recommended workflow:
- Review PII flags — Check which columns were flagged during schema discovery
- Set sensitivity levels — Classify tables based on data sensitivity
- Configure RBAC rules — Use Data Access Control to deny access to sensitive columns for specific groups or users
This separation keeps you in control. Automatic detection surfaces the data that needs attention, and you decide what restrictions to apply.
Audit logging
PII-related actions are logged for compliance and traceability:
- When columns are flagged as PII (automatically or manually)
- When PII flags are removed
- When sensitivity levels are changed
These events appear in your organization's audit logs, providing a complete history of PII-related configuration changes.
Related pages
- Data Access Control — Restrict access to sensitive columns
- Schema Discovery — When PII detection runs