Overview
The Analytics module turns Vigilo's operational ledger — changes, incidents, approvals, runbook executions — into the four DORA metrics (Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Resolve) plus a library of derived service- and team-level views. It lives at /ws/{slug}/analytics and renders the DoraPage with tabs for Overview, By team, By service, By change type, and Goals.
Because every CHG and INC is already a first-class object in the workspace, DORA scoring is deterministic from the same database the rest of the product uses — no separate event pipeline, no third-party telemetry collector, and no opportunity for the four numbers to disagree with the underlying records.
Why it exists
DORA metrics are the de-facto language for measuring delivery and reliability performance. Most organisations stitch them together from build systems (deploy frequency), ticket systems (lead time), incident tools (MTTR), and Excel (failure rate), with no shared definition or audit trail. Vigilo computes all four from a single source of truth — the ChangeRequest and Incident tables — so the dashboard you stare at in the morning and the export your auditor sees come from the same query.
Key concepts
DoraMetricsApi
The endpoint GET /ws/{slug}/api/v1/analytics/dora/ returns a structured payload with the four metrics over a configurable window (default last 30 days). Query params support ?start=, ?end=, ?group_by=, and ?filter=.
Response shape (simplified):
{
"window": {"start": "...", "end": "..."},
"deploy_frequency_per_day": 4.2,
"lead_time_hours": {"p50": 18.0, "p90": 72.0},
"change_failure_rate": 0.07,
"mttr_hours": {"p50": 1.5, "p90": 6.4}
}
Metric definitions
- deploy_frequency_per_day — Count of
ChangeRequestrows withclosure_code='successful'(orsuccessful_with_issues) completed in the window, divided by window length in days. - lead_time_hours — Per-change duration from
created_attocompleted_atfor successfully closed changes; reported as p50 and p90. - change_failure_rate — Count of changes with
closure_code='unsuccessful'OR FSM staterolled_backOR with a linked incident opened within 24 hours of completion, divided by total closed changes in the window. - mttr_hours — Per-incident duration from
opened_attoresolved_atfor incidents resolved in the window; reported as p50 and p90.
The 24-hour incident-link window for failure rate is configurable on Workspace.settings.dora_failure_window_hours.
Grouping
?group_by=team|service|change_type returns the metrics computed per group instead of workspace-wide (WA.10). The response becomes an array of {key, ...metrics} rows. The UI uses this on the By team / By service / By change type tabs to render small-multiples — one mini-card per group with a sparkline.
- team group key —
WorkspaceMembership.team(if set). - service group key —
Asset.idfor the primaryaffected_asseton the change (orincident.service). - change_type group key —
ChangeRequest.change_type(standard | normal | emergency).
Regression alerts
A scheduled Celery task (WB.20) runs nightly and compares the rolling 7-day metric values to the rolling 28-day baseline per group. If a metric regresses by more than the threshold (defaults: deploy_frequency down 30%, lead_time up 30%, failure_rate up 50%, mttr up 30%) the system creates a MetricRegressionAlert row and dispatches the metric.regression webhook event. Subscribed channels (Slack, email) receive a formatted summary with a link to the analytics tab.
Thresholds are tunable per workspace under Settings → Analytics → Regression thresholds. Acknowledging an alert silences re-fires for the same metric+group for 24 hours.
DORA goals
Each workspace can declare quarterly goals — deploy_frequency_per_day ≥ 5, lead_time_p50 ≤ 24h, change_failure_rate ≤ 0.10, mttr_p50 ≤ 2h. Goals are stored in Workspace.settings.dora_goals and render as a target line on the trend charts. Goals are not enforced — the workspace will not refuse changes that breach them — but breaches surface a yellow chip on the DORA card and feed into the executive summary widget.
DoraPage UI tabs
- Overview — the four headline numbers, each with a 7-day sparkline and the goal line.
- By team — small-multiples by team.
- By service — small-multiples by service (driven by
affected_assets). - By change type — segmented by
standard | normal | emergency. - Goals — read-only summary of current goals, last quarter result, and an admin-only edit button.
Common workflows
Read the current DORA score
- Open
/ws/{slug}/analytics. The Overview tab loads with default 30-day window. - Each card shows the current value, the absolute change vs the previous window, and the goal target line.
- Hover the sparkline to see daily values.
Investigate a regression alert
- The Slack channel pings: "MTTR (p50) for team
paymentsregressed +45% vs 28-day baseline." - Click through to
/ws/{slug}/analytics?group_by=team&team=payments. - Inspect the MTTR sparkline — the spike usually corresponds to one or two long incidents. Hover the spikes to deep-link to the INC numbers.
- Acknowledge the alert (button on the alert detail) once investigated.
Set quarterly goals
- Admin opens Settings → Analytics → Goals.
- Set the four targets and the quarter end date. Click Save.
- The goal line appears on every DORA chart.
Export
DORA payloads are exportable as CSV or JSON from the Overview tab action menu. The export honours current filters and grouping. To schedule the same export, see Saved and scheduled reports.
Permissions
- Viewers can read all analytics views and the DORA card.
- Engineers / approvers can acknowledge regression alerts.
- Admins / owners can edit goals, edit regression thresholds, and disable specific groups.
Troubleshooting
DORA card shows "—" for change_failure_rate — Either no changes closed in the window, or the workspace has all changes closure_code='successful' and no linked incidents within the failure window. The card displays a sample-size badge so you can distinguish "zero failures" from "insufficient data".
Regression alert keeps re-firing — The 24-hour suppression only blocks re-fires for the same metric and group. If multiple groups regress, each gets its own alert. If the suppression is genuinely broken, check the Celery beat log for the evaluate_metric_regressions task.
Numbers don't match my external tool — Likely a definition mismatch. Vigilo's deploy_frequency counts successful changes only (mirroring DORA's "successful deploy" definition) — many third-party tools count all attempts. Override behaviour by editing dora_failure_window_hours or by filtering the API directly.
Goal line missing from sparkline — No goal set for that metric, or the goal expired (past the quarter end date). Set a new goal under Settings → Analytics → Goals.