PII Detection

Thallus automatically identifies columns that likely contain personal information — emails, phone numbers, social security numbers — and flags them so admins can apply appropriate access controls.

Automatic detection

During schema discovery, Thallus scans column names for patterns that suggest personally identifiable information:

Category Example column names
Email email, e_mail, email_address
Phone phone, mobile, cell_number
Identity ssn, social_security, passport, national_id
Financial credit_card, card_number, bank_account, iban
Authentication password, api_key, auth_token, secret
Address address, street, zip_code, postal_code
Personal birth_date, dob, salary, income

How detection works

Thallus detects PII from column names and optional format patterns — not by scanning your actual data in bulk.

  • Column name matching — Names are checked against known PII patterns across all categories above
  • Format detection — When limited sample values are available, patterns like email formats and phone number formats are identified
  • No data scanning — This is a lightweight, privacy-safe approach. Your actual data is not read in bulk during PII detection

What happens when PII is detected

When a column is flagged, three things happen:

order_id
integer
product_name
varchar
customer_email
PII · email
phone_number
PII · phone
ssn
PII · identity
  1. Column flagged — A PII indicator appears in the schema view next to the column, showing the detected category
  2. Sample values hidden — Sample values are never cached or displayed for PII columns, even in admin views
  3. Admin notification — Admins can review flagged columns and decide what access controls to apply

Sensitivity levels

Tables can be classified at three sensitivity levels to indicate the type of data they contain:

Normal Sensitive Restricted
Level Meaning Use for
Normal No special handling General business data
Sensitive Contains PII or personal data Customer info, employee records
Restricted High-value confidential data Financial records, credentials

Sensitivity levels are informational — they help admins make informed decisions about access control but don't automatically restrict access on their own.


Manual overrides

Automatic detection catches common patterns, but admins have full control to adjust:

  • Mark columns as PII that weren't auto-detected — useful for columns with non-standard names
  • Clear false positives — Remove the PII flag from safe columns (e.g., a column named email_template that stores template text, not email addresses)
  • Exclude columns — Hide them from agent visibility entirely, regardless of PII status
  • Add descriptions — Clarify what a column contains to help agents generate more accurate queries

PII and access control

PII detection informs admin decisions but doesn't automatically restrict access. The recommended workflow:

  1. Review PII flags — Check which columns were flagged during schema discovery
  2. Set sensitivity levels — Classify tables based on data sensitivity
  3. Configure RBAC rules — Use Data Access Control to deny access to sensitive columns for specific groups or users

This separation keeps you in control. Automatic detection surfaces the data that needs attention, and you decide what restrictions to apply.


Audit logging

PII-related actions are logged for compliance and traceability:

  • When columns are flagged as PII (automatically or manually)
  • When PII flags are removed
  • When sensitivity levels are changed

These events appear in your organization's audit logs, providing a complete history of PII-related configuration changes.