Security

GDPR, residency, and data lifecycle

Vigilo ships three primitives for satisfying GDPR / CCPA / equivalent regulations: a self-service data export (the user gets a zip of every PII record…

Last updated

Overview

Vigilo ships three primitives for satisfying GDPR / CCPA / equivalent regulations: a self-service data export (the user gets a zip of every PII record about them), a right-to-be-forgotten flow (the user's PII is anonymized in place while preserving audit history), and per-workspace data residency that pins data to a region. The export and forget paths are delivered by WC.9; residency is delivered by WC.10; audit retention sweeps are W0.8.

A workspace owner can configure all three. An end user can self-serve the export of their own data and request forget via the same UI; the request is queued and processed by Celery.

Data export (DataExportRequest)

Model

DataExportRequest rows are created when a user invokes the export flow. Fields:

  • user — the subject of the export
  • workspace — the workspace scope (export is per-workspace, not global)
  • requested_at
  • statuspending, running, completed, failed
  • output_path — path to the generated zip (when completed)
  • expires_at — typically 7 days after completion; the file is deleted after this

Celery task: export_user_data

The task vigilo.security.export_user_data runs the heavy lifting:

  1. Walk every app in INSTALLED_APPS.
  2. For each model that has a user FK or actor FK matching the subject, dump matching rows to JSON.
  3. Apply the per-app PII filter (so we export the user's profile but not other users' rows the subject happened to author — those are anonymized in the dump as actor=<user-id>).
  4. Bundle into export-{user_id}-{request_id}.zip with one JSON file per app.
  5. Set output_path and email the user a signed download URL valid until expires_at.

Files include: profile.json, memberships.json, notifications.json, audit_log_actions_by_user.json, change_requests_authored.json, incidents_authored.json, comments.json, mfa_metadata.json (without the actual secret), api_tokens.json (token names only, never values).

Endpoint

POST /workspaces/me/export-data/ — creates a DataExportRequest, queues the task, returns 202 Accepted with the request ID. The user gets an email when the export is ready.

UI

Settings → Privacy → Export my data. Button kicks off the request. The page shows existing requests with status and download link.

Right to be forgotten (forget_user)

Celery task: forget_user

The task vigilo.security.forget_user performs PII anonymization, preserving structural records so audit, change, and incident history stay intact:

  • UserProfile.emailredacted-<uuid>@deleted.local
  • UserProfile.display_name[Redacted User]
  • UserProfile.phoneNULL
  • UserProfile.avatar → deleted from storage; field set to null
  • UserProfile.last_login_ipNULL
  • UserProfile.totp_secretNULL, mfa_enabled=False, backup_codes=[]
  • WorkspaceMembership.* for this user → deleted
  • ApiToken.* for this user → deleted (so post-anonymization tokens cannot still be used)
  • Comment.author_id is preserved (so the conversation makes sense) but the FK now points to a redacted profile
  • AuditLog.actor_id is preserved with a marker (metadata.redacted=True) so auditors can still verify the chain
  • ChangeRequest.requested_by and Incident.reported_by preserved with the same marker

Why preserve some records

A complete delete breaks audit trails — auditors need to know "an actor approved this change", even after the actor is gone. The compromise: structural records keep the FK but the underlying profile is redacted, so re-identification is impossible from Vigilo data alone.

This is consistent with GDPR Article 17 in practice — the right to erasure is not absolute; legitimate interest (audit, legal, fraud prevention) can be cited to retain pseudonymized records.

Endpoint and access

POST /workspaces/{slug}/forget-user/ — body {user_id, reason}. Restricted to workspace owners; the user cannot self-forget without owner approval (this prevents an actor from erasing themselves to escape accountability).

The action is irreversible. The owner must confirm via a typed phrase in the dialog.

Data residency (WC.10)

Workspace.region

Workspace.region is a CharField with three values: us, eu, apac. Set at workspace creation and immutable thereafter (changing region requires a full export + new-workspace-import flow because of underlying database residency).

The region indicates which physical PostgreSQL cluster (and which FastAPI scanner DB) hosts the workspace's data. In multi-region deployments, each region runs its own full stack with replication only of identity (UserProfile, federation metadata) and tenant lifecycle records.

DataResidencyMiddleware

DataResidencyMiddleware enforces region access at the edge:

  1. Resolve the workspace from the URL.
  2. Compare workspace.region to the current host's region (VIGILO_HOST_REGION env var, e.g. us).
  3. If they don't match, return a 307 redirect to the region-correct host: Location: https://eu.vigilo.example.com{path}. The redirect is a 307 (not 302) so POST bodies are preserved.

This means a user who bookmarks us.vigilo.example.com/ws/eu-workspace/... is automatically bounced to the EU host on next request. CDN configuration treats each region as a separate origin.

For development with a single host, set VIGILO_STRICT_RESIDENCY=false to disable the middleware; cross-region access is allowed for local debugging.

Audit retention (W0.8)

Per-workspace retention policy

Workspace.settings['audit_retention'] is a dict:

{
  "retention_days": 730,
  "archive_target": "s3://acme-audit/{workspace_slug}/{year}/{month}/"
}

retention_days is how long online audit rows are kept. Default 365; the regulatory floor for some industries is 2555 (7 years). After this, rows are archived.

Archive sweep

The Celery task vigilo.audit.archive_sweep runs nightly. For each workspace:

  1. Read the retention policy.
  2. Find audit rows older than retention_days.
  3. Stream them to the configured archive_target (S3 today, more backends roadmap) as gzipped JSON Lines.
  4. Delete the rows from the live table once the archive write is confirmed (read-after-write check).
  5. Write a meta-audit entry recording the sweep: rows archived, byte count, archive object key.

The archive object includes the hash chain so the audited chain can be verified offline.

Common workflows

1. A user requests their data

  1. User goes to Settings → Privacy → Export my data → Request.
  2. Email arrives within minutes (usually) with a signed download URL.
  3. Zip contains one JSON file per relevant app.

2. A user requests deletion

  1. User opens Settings → Privacy → Delete my account.
  2. Vigilo lists the user's workspaces.
  3. For each, the user can request forget. The request goes to that workspace's owners (an in-app + email notification).
  4. The owner reviews and either approves (running forget_user) or denies (with a reason; commonly "must retain for active legal hold").

3. Pin a new workspace to the EU

  1. Platform admin creates the workspace with region=eu via the API or vigilo workspace create --region eu.
  2. The workspace is provisioned on the EU stack only; the US stack never receives a row.
  3. Members worldwide can use it; their requests are auto-routed via the residency middleware.

4. Configure 7-year audit retention

  1. Settings → Compliance → Audit retention.
  2. Set retention_days = 2555.
  3. Configure archive_target to an S3 bucket in the workspace's region.
  4. The nightly sweep handles the rest.

Permissions

Action Role
Self-serve data export Any member (for their own data)
Approve forget request Owner
Set audit retention policy Owner
Change workspace region Not supported (create new workspace + migrate)

Troubleshooting

Export email never arrives. The task ran but the email send failed. Check DataExportRequest.status — if completed, the file exists; download via the Settings → Privacy page directly.

Forget didn't remove the user from a workspace I belong to. Forget is per-workspace; it only runs in the workspaces whose owners approved the request. The user must repeat the request per workspace.

Cross-region redirect loops. Both regions are reporting the wrong VIGILO_HOST_REGION. Check env vars on each cluster.

Audit archive sweep never runs. Celery Beat isn't picking up vigilo.audit.archive_sweep. Verify it's in beat_schedule.py with a cron entry (default: 02:30 UTC daily).

An archived audit row I need is no longer in the UI. Pull it from S3. The archive object name follows {workspace_slug}/{year}/{month}/{date}.jsonl.gz. The chain verification tool can re-merge archived rows with live rows for full-period verification.

Related articles