PII Detection

Thallus automatically identifies columns that likely contain personal information — emails, phone numbers, social security numbers — and flags them so admins can apply appropriate access controls.

Automatic detection

During schema discovery, Thallus scans column names for patterns that suggest personally identifiable information:

Category	Example column names
Email	email, e_mail, email_address
Phone	phone, mobile, cell_number
Identity	ssn, social_security, passport, national_id
Financial	credit_card, card_number, bank_account, iban
Authentication	password, api_key, auth_token, secret
Address	address, street, zip_code, postal_code
Personal	birth_date, dob, salary, income

How detection works

Thallus detects PII from column names and optional format patterns — not by scanning your actual data in bulk.

Column name matching — Names are checked against known PII patterns across all categories above
Format detection — When limited sample values are available, patterns like email formats and phone number formats are identified
No data scanning — This is a lightweight, privacy-safe approach. Your actual data is not read in bulk during PII detection

What happens when PII is detected

When a column is flagged, three things happen:

order_id

integer

product_name

varchar

customer_email

PII · email

phone_number

PII · phone

ssn

PII · identity

Column flagged — A PII indicator appears in the schema view next to the column, showing the detected category
Sample values hidden — Sample values are never cached or displayed for PII columns, even in admin views
Admin notification — Admins can review flagged columns and decide what access controls to apply

Sensitivity levels

Tables can be classified at three sensitivity levels to indicate the type of data they contain:

Normal Sensitive Restricted

Level	Meaning	Use for
Normal	No special handling	General business data
Sensitive	Contains PII or personal data	Customer info, employee records
Restricted	High-value confidential data	Financial records, credentials

Sensitivity levels are informational — they help admins make informed decisions about access control but don't automatically restrict access on their own.

Manual overrides

Automatic detection catches common patterns, but admins have full control to adjust:

Mark columns as PII that weren't auto-detected — useful for columns with non-standard names
Clear false positives — Remove the PII flag from safe columns (e.g., a column named email_template that stores template text, not email addresses)
Exclude columns — Hide them from agent visibility entirely, regardless of PII status
Add descriptions — Clarify what a column contains to help agents generate more accurate queries

PII and access control

PII detection informs admin decisions but doesn't automatically restrict access. The recommended workflow:

Review PII flags — Check which columns were flagged during schema discovery
Set sensitivity levels — Classify tables based on data sensitivity
Configure RBAC rules — Use Data Access Control to deny access to sensitive columns for specific groups or users

This separation keeps you in control. Automatic detection surfaces the data that needs attention, and you decide what restrictions to apply.

Audit logging

PII-related actions are logged for compliance and traceability:

When columns are flagged as PII (automatically or manually)
When PII flags are removed
When sensitivity levels are changed

These events appear in your organization's audit logs, providing a complete history of PII-related configuration changes.

Data Access Control — Restrict access to sensitive columns
Schema Discovery — When PII detection runs