Overview
Vigilo ships three primitives for satisfying GDPR / CCPA / equivalent regulations: a self-service data export (the user gets a zip of every PII record about them), a right-to-be-forgotten flow (the user's PII is anonymized in place while preserving audit history), and per-workspace data residency that pins data to a region. The export and forget paths are delivered by WC.9; residency is delivered by WC.10; audit retention sweeps are W0.8.
A workspace owner can configure all three. An end user can self-serve the export of their own data and request forget via the same UI; the request is queued and processed by Celery.
Data export (DataExportRequest)
Model
DataExportRequest rows are created when a user invokes the export flow. Fields:
user— the subject of the exportworkspace— the workspace scope (export is per-workspace, not global)requested_atstatus—pending,running,completed,failedoutput_path— path to the generated zip (when completed)expires_at— typically 7 days after completion; the file is deleted after this
Celery task: export_user_data
The task vigilo.security.export_user_data runs the heavy lifting:
- Walk every app in
INSTALLED_APPS. - For each model that has a
userFK oractorFK matching the subject, dump matching rows to JSON. - Apply the per-app PII filter (so we export the user's profile but not other users' rows the subject happened to author — those are anonymized in the dump as
actor=<user-id>). - Bundle into
export-{user_id}-{request_id}.zipwith one JSON file per app. - Set
output_pathand email the user a signed download URL valid untilexpires_at.
Files include: profile.json, memberships.json, notifications.json, audit_log_actions_by_user.json, change_requests_authored.json, incidents_authored.json, comments.json, mfa_metadata.json (without the actual secret), api_tokens.json (token names only, never values).
Endpoint
POST /workspaces/me/export-data/ — creates a DataExportRequest, queues the task, returns 202 Accepted with the request ID. The user gets an email when the export is ready.
UI
Settings → Privacy → Export my data. Button kicks off the request. The page shows existing requests with status and download link.
Right to be forgotten (forget_user)
Celery task: forget_user
The task vigilo.security.forget_user performs PII anonymization, preserving structural records so audit, change, and incident history stay intact:
UserProfile.email→redacted-<uuid>@deleted.localUserProfile.display_name→[Redacted User]UserProfile.phone→NULLUserProfile.avatar→ deleted from storage; field set to nullUserProfile.last_login_ip→NULLUserProfile.totp_secret→NULL,mfa_enabled=False,backup_codes=[]WorkspaceMembership.*for this user → deletedApiToken.*for this user → deleted (so post-anonymization tokens cannot still be used)Comment.author_idis preserved (so the conversation makes sense) but the FK now points to a redacted profileAuditLog.actor_idis preserved with a marker (metadata.redacted=True) so auditors can still verify the chainChangeRequest.requested_byandIncident.reported_bypreserved with the same marker
Why preserve some records
A complete delete breaks audit trails — auditors need to know "an actor approved this change", even after the actor is gone. The compromise: structural records keep the FK but the underlying profile is redacted, so re-identification is impossible from Vigilo data alone.
This is consistent with GDPR Article 17 in practice — the right to erasure is not absolute; legitimate interest (audit, legal, fraud prevention) can be cited to retain pseudonymized records.
Endpoint and access
POST /workspaces/{slug}/forget-user/ — body {user_id, reason}. Restricted to workspace owners; the user cannot self-forget without owner approval (this prevents an actor from erasing themselves to escape accountability).
The action is irreversible. The owner must confirm via a typed phrase in the dialog.
Data residency (WC.10)
Workspace.region
Workspace.region is a CharField with three values: us, eu, apac. Set at workspace creation and immutable thereafter (changing region requires a full export + new-workspace-import flow because of underlying database residency).
The region indicates which physical PostgreSQL cluster (and which FastAPI scanner DB) hosts the workspace's data. In multi-region deployments, each region runs its own full stack with replication only of identity (UserProfile, federation metadata) and tenant lifecycle records.
DataResidencyMiddleware
DataResidencyMiddleware enforces region access at the edge:
- Resolve the workspace from the URL.
- Compare
workspace.regionto the current host's region (VIGILO_HOST_REGIONenv var, e.g.us). - If they don't match, return a
307redirect to the region-correct host:Location: https://eu.vigilo.example.com{path}. The redirect is a307(not302) so POST bodies are preserved.
This means a user who bookmarks us.vigilo.example.com/ws/eu-workspace/... is automatically bounced to the EU host on next request. CDN configuration treats each region as a separate origin.
For development with a single host, set VIGILO_STRICT_RESIDENCY=false to disable the middleware; cross-region access is allowed for local debugging.
Audit retention (W0.8)
Per-workspace retention policy
Workspace.settings['audit_retention'] is a dict:
{
"retention_days": 730,
"archive_target": "s3://acme-audit/{workspace_slug}/{year}/{month}/"
}
retention_days is how long online audit rows are kept. Default 365; the regulatory floor for some industries is 2555 (7 years). After this, rows are archived.
Archive sweep
The Celery task vigilo.audit.archive_sweep runs nightly. For each workspace:
- Read the retention policy.
- Find audit rows older than
retention_days. - Stream them to the configured
archive_target(S3 today, more backends roadmap) as gzipped JSON Lines. - Delete the rows from the live table once the archive write is confirmed (read-after-write check).
- Write a meta-audit entry recording the sweep: rows archived, byte count, archive object key.
The archive object includes the hash chain so the audited chain can be verified offline.
Common workflows
1. A user requests their data
- User goes to Settings → Privacy → Export my data → Request.
- Email arrives within minutes (usually) with a signed download URL.
- Zip contains one JSON file per relevant app.
2. A user requests deletion
- User opens Settings → Privacy → Delete my account.
- Vigilo lists the user's workspaces.
- For each, the user can request forget. The request goes to that workspace's owners (an in-app + email notification).
- The owner reviews and either approves (running
forget_user) or denies (with a reason; commonly "must retain for active legal hold").
3. Pin a new workspace to the EU
- Platform admin creates the workspace with
region=euvia the API orvigilo workspace create --region eu. - The workspace is provisioned on the EU stack only; the US stack never receives a row.
- Members worldwide can use it; their requests are auto-routed via the residency middleware.
4. Configure 7-year audit retention
- Settings → Compliance → Audit retention.
- Set
retention_days = 2555. - Configure
archive_targetto an S3 bucket in the workspace's region. - The nightly sweep handles the rest.
Permissions
| Action | Role |
|---|---|
| Self-serve data export | Any member (for their own data) |
| Approve forget request | Owner |
| Set audit retention policy | Owner |
| Change workspace region | Not supported (create new workspace + migrate) |
Troubleshooting
Export email never arrives. The task ran but the email send failed. Check DataExportRequest.status — if completed, the file exists; download via the Settings → Privacy page directly.
Forget didn't remove the user from a workspace I belong to. Forget is per-workspace; it only runs in the workspaces whose owners approved the request. The user must repeat the request per workspace.
Cross-region redirect loops. Both regions are reporting the wrong VIGILO_HOST_REGION. Check env vars on each cluster.
Audit archive sweep never runs. Celery Beat isn't picking up vigilo.audit.archive_sweep. Verify it's in beat_schedule.py with a cron entry (default: 02:30 UTC daily).
An archived audit row I need is no longer in the UI. Pull it from S3. The archive object name follows {workspace_slug}/{year}/{month}/{date}.jsonl.gz. The chain verification tool can re-merge archived rows with live rows for full-period verification.
Related articles
- CMEK and encryption — secrets are encrypted; exports never include them as plaintext.
- RBAC and tenancy — owners are who approve forget requests.
- Audit log — the retention + sweep applies to this data.