Overview
An Incident (Incident, keyed INC-NNNN per workspace) is Vigilo's record of an unplanned interruption to service. The model carries everything you need to run the response — severity, status, commander, responders, root cause, action items, structured timeline — plus the cross-system glue: caused-by-change link, Slack channel, PagerDuty mirror, and the postmortem one-to-one.
Incidents live under /ws/{slug}/incidents. The list groups by status with severity chips; the detail page splits across Overview, Timeline, Responders, Root cause, Action items, and (after resolution) Postmortem.
Why it exists
The on-call doesn't have time to copy-paste from Slack into a ticket into a runbook into a postmortem. The Incident model is the single record that all those surfaces read and write into — the Slack channel posts append to the timeline, the PagerDuty page mutates status, the postmortem inherits the same timeline. One source of truth, one audit trail, one URL to share.
Key concepts
- severity —
p1(critical),p2(high),p3(medium, default),p4(low). Drives auto-paging, status-page auto-publish, and dashboard prominence. - status —
open,investigating,mitigated,resolved,post_mortem. Enforced by the FSM viainvestigate(),mitigate(),resolve(),open_post_mortem(). - commander — The single accountable person for the response. Picked at incident creation or promoted from a responder via the Promote to commander action.
- IncidentResponder — Through-model on the
respondersM2M. Each responder has aroleofcommander,responder, orobserverand ajoined_attimestamp. - caused_by_change — Optional FK to a
ChangeRequest. Auto-suggested by Suggest cause (WA.14), which scans recent changes against the incident's affected services. One click links them. - timeline — JSON array of
{at, source, label, actor, payload}events. Auto-populated from the audit log; manually editable via the timeline editor (T2.9). - slack_channel_id — Set by the Slack lifecycle handler when
SlackIntegration.auto_create_channel=True(WA.16). The channel is created onopen→investigatingand archived onresolve(). - pagerduty_incident_id — Mirrors a PagerDuty incident id (WB.7). Lets inbound PD webhooks (
pagerduty_events) acknowledge or resolve the Vigilo incident when the PD side flips. - IncidentStorm — Aggregation row created by
vigilo.incidents.detect_storms(WD.8) when 3+ related incidents fire in a short window. Surfaced as a banner on the incidents page so responders see the bigger pattern, not just the individual symptoms.
Common workflows
Opening an incident
- Click New incident in the incidents list (or have an integration POST to
/incidents/). - Fill in title, description, severity, and
detected_at. The commander defaults to the creating user. - On save, several things happen automatically:
- The KB-suggest panel (WA.7) shows runbooks whose titles or tags match the incident.
- The Slack lifecycle handler creates a dedicated channel (
#inc-INC-0042-...) and posts the situation report (WA.16). - If severity meets the workspace's
status_page_auto_publish_severity, a publicStatusPageIncidentis created and kept in sync. - If Twilio paging is configured (WD.7), the current on-call is SMS'd.
- Use Add responders to pull in additional engineers. Each addition writes an
IncidentResponderrow.
Running the incident
- Click Start investigating to move from
opentoinvestigating. The Slack channel auto-announces the transition. - Post updates via the timeline editor or via Slack — Slack messages tagged
/vigilo log(or any message in the incident channel, depending on the integration setting) append to the structuredtimeline. - Click Mark mitigated when the user-facing impact has stopped.
mitigated_atis stamped. - Click Resolve when the underlying cause is verified fixed.
resolved_atis stamped, the Slack channel is archived (ifauto_archiveis on), the PagerDuty mirror is closed, and the status-page incident is moved to resolved.
Linking the causing change
- On the incident detail page, the Probable cause panel surfaces suggestions from WA.14: recent changes whose
affected_assetsintersect the incident's tagged services. - Click Link on the right suggestion.
caused_by_changeis set; the linked change's detail page now shows a back-link in its Related incidents card. - If no suggestion is right, use the Pick a change search to bind manually.
Opening a postmortem
- From a resolved incident, click Open postmortem. The FSM moves to
post_mortemand aPostmortemrow is created via the OneToOne. - Vigilo auto-populates the postmortem's
timelinefrom the incident's structured timeline (T1.9) and offers an AI root-cause first draft (WA.3) if AI integrations are configured. - See Postmortems for the full publish flow.
Handling an incident storm
When IncidentStorm detection fires, the incidents page shows a banner: "5 related incidents in the last 30 minutes — possible storm." Click the banner to open the storm detail, which lists every member incident, the common tags / assets that grouped them, and a single Acknowledge all action that pulls every member into a shared commander.
Permissions
- Viewers can read incidents but cannot transition or post updates.
- Engineers / responders (
incidents.respond) can post updates, edit the timeline, and transition throughinvestigate → mitigate → resolve. - Commanders are the only ones who can promote, demote, or remove responders.
- Admins / owners can edit any field, force a status transition, and unlink the postmortem.
Troubleshooting
No Slack channel was created — Either the workspace's SlackIntegration.auto_create_channel is False, the Slack bot lacks channels:manage, or the channel name collided. Check Settings → Integrations → Slack for the last error.
No SMS page went out — Twilio is unconfigured, the on-call schedule has no current slot, or the on-call user has no phone number on their profile. See On-call.
The status page didn't update — auto_publish_to_status is False on the incident, or severity is below the workspace's threshold. Both can be edited mid-incident.
My PagerDuty resolution didn't close the Vigilo incident — Either pagerduty_incident_id was never set (the Vigilo → PD push failed), or the inbound webhook isn't pointed at /integrations/pagerduty_events/. The integration log under Settings → Integrations will show the last inbound delivery.
The storm banner won't dismiss — Storm rows resolve themselves when no new member incidents have joined for 1 hour. You can also mark the storm resolved manually from its detail page.