Platform

Cost attribution

Cost attribution gives platform admins a defensible answer to "which workspace is using what, and what is it costing us?". It is delivered by the…

Last updated

Overview

Cost attribution gives platform admins a defensible answer to "which workspace is using what, and what is it costing us?". It is delivered by the WorkspaceUsageSnapshot model (WD.11), a once-a-day Celery task that takes a measurement per workspace, a configurable rate card sourced from the VIGILO_USAGE_UNIT_COSTS environment variable, and a UI at /platform/cost with a leaderboard, time-series charts, and CSV export.

The page lists every workspace, the snapshot metrics for the chosen period, the computed cost, and the percentage of total org spend that workspace represents. Sort by any column, switch the period (7 / 30 / 90 / 365 days), and export the raw rows for finance to load into a chargeback spreadsheet.

Why it exists

Multi-tenant platforms get bigger by accident. A team spins up a workspace for a one-off project, the project succeeds, the workspace fills with members, audit logs grow, integrations multiply, and three quarters later the workspace is responsible for 40% of platform load with no one tracking it. Cost attribution exposes that drift, lets finance issue accurate cross-charges, and gives the platform team data to negotiate a hosting upgrade with leadership before capacity becomes a fire.

Key concepts

  • WorkspaceUsageSnapshot — One row per (workspace, snapshot_date). Fields include member_count, api_call_count_24h, storage_bytes, change_count_24h, incident_count_24h, monitored_host_count, audit_event_count_24h, webhook_delivery_count_24h, celery_task_count_24h. All counts are deltas for the previous 24h; gauges (members, storage, hosts) are point-in-time.
  • Daily snapshot taskvigilo.platform.usage_snapshot runs at 00:10 UTC via Celery Beat. It iterates every active workspace, computes the metrics under a temporary RLS context, and upserts a row. The task is idempotent — re-running it for today simply overwrites today's row.
  • Rate cardVIGILO_USAGE_UNIT_COSTS is a JSON env var: {"member_count": 5.00, "storage_gb": 0.10, "api_call": 0.000002, ...}. The cost engine multiplies each metric by its unit price; missing keys default to 0. Change the rate card any time; historical snapshots are unchanged but the displayed cost recomputes on the next page load.
  • Leaderboard — Default view. One row per workspace, sortable by any column, percentage-of-total bar in the rightmost column.
  • Time-series chart — Click any workspace row to expand a 90-day sparkline per metric. Useful for spotting the moment a workspace started growing.
  • CSV exportExport CSV button gives you the raw snapshot rows for the selected period. Columns match the model field names, plus computed cost_usd per row. Open in Excel or pipe into a chargeback script.

Common workflows

1. Find your most expensive workspace this month

  1. Open Platform → Cost.
  2. Set the period to 30 days.
  3. Sort by Cost (USD) descending.
  4. The top row is your biggest contributor. Click to expand the time-series — if the slope is steep, that workspace is still growing; if flat, it's just naturally large.

2. Tune your rate card

  1. SSH to a Django host and update VIGILO_USAGE_UNIT_COSTS in the environment.
  2. Restart the Django/Gunicorn process (the rate card is read at request time on the cost page, so a restart picks up the new prices immediately).
  3. Reload the cost page — the Cost column now reflects the new pricing for every historical snapshot.

A reasonable starting rate card for an internal chargeback model:

{
  "member_count": 5.00,
  "storage_gb": 0.10,
  "api_call": 0.000002,
  "celery_task": 0.0001,
  "monitored_host": 0.50,
  "webhook_delivery": 0.00005
}

Adjust until the org total matches your actual hosting bill.

3. Export for finance

  1. Set the period to the calendar month you are billing for (use Custom range).
  2. Click Export CSV.
  3. The download vigilo-usage-{YYYYMMDD}-{YYYYMMDD}.csv contains one row per workspace per snapshot day, with raw counters and the computed cost_usd.
  4. Pivot in Excel by workspace_slug and sum cost_usd to get monthly chargeback per team.

4. Spot a runaway metric

  1. Sort the leaderboard by API calls / 24h.
  2. Click the top workspace to expand the sparkline.
  3. A vertical step in the chart is usually an integration misconfiguration — a polling loop with a 1-second interval, or a webhook fan-out without backoff. Open the workspace's Integrations page to find the culprit.

Permissions

Cost attribution is platform-admin only. The endpoint returns 403 with code='platform_admin_required' for everyone else.

Action Required role
View cost page platform_admin
Edit rate card platform_admin + shell access (it's env, not UI)
CSV export platform_admin

Per-workspace owners can see their own workspace's snapshots from Settings → Usage but cannot see peer workspaces.

Troubleshooting

The leaderboard is empty for today. The daily Celery snapshot task has not yet run. Check Celery Beat logs for vigilo.platform.usage_snapshot; manually trigger it with celery -A vigilo_celery call vigilo.platform.usage_snapshot to backfill.

Costs look way off. Check VIGILO_USAGE_UNIT_COSTS. The most common mistake is using api_call: 0.002 (per-thousand pricing) when the formula expects per-call. Divide by 1000 and reload.

Storage bytes are zero for every workspace. The storage probe uses pg_total_relation_size against the workspace's RLS-filtered rows. If the database user does not have pg_read_all_stats, the probe falls back to zero. Grant the role and re-run the snapshot task.

A workspace I just created has no snapshot. The task only runs at 00:10 UTC daily; new workspaces appear after the next run. Trigger the task manually to backfill immediately.

Roadmap note: real cloud billing integration

The current cost engine uses a unit rate card multiplied by Vigilo-side counters — it does not pull real spend from your cloud provider. Direct integration with AWS Cost & Usage Reports (CUR) and Google Cloud Billing Export is on the roadmap and will land as a second cost stream alongside the snapshot-based one. Until then, treat the displayed cost as an internal chargeback estimate, not an actual cloud bill.

Related articles