Overview
Every change in Vigilo lives on a finite state machine. The states aren't just a UI hint — they're enforced server-side. The Django change_request.transition_to(next_state, actor) call is the only way to change state. It runs the guard conditions for that transition, writes an audit row, fires events through dispatch_event, and rolls back atomically if any guard fails.
This article walks through every state, every transition, the guard each enforces, and the events each emits. If you're integrating with the change API or building an automation around it, read this carefully.
Why it exists
A change process that lives in code is one that fires the same way every time, can be audited mechanically, and survives staff turnover. The point of formalising it as an FSM (rather than a tangle of if status == checks) is that the transition table IS the policy — when somebody asks "can we skip approval on standard low-risk changes?" the answer is a row in the table, not a code-base safari.
Key concepts
- State — One of
draft,review,approved,scheduled,in_progress,completed,failed,rolled_back,cancelled. - Transition — A directed edge between two states, plus a guard function and an event name. Defined in
apps/changes/fsm.py. - Guard — A predicate that must return true for the transition to fire. Examples: "all required approvals are in," "no freeze window blocks the planned start," "all task-gate tasks are
done." - Event — Fired via
dispatch_event(<name>, payload)after the transition succeeds. Names followchange.<verb>(e.g.change.approved,change.started,change.completed). - Audit row — Written before the event fires, captures
actor,from_state,to_state,comment,timestamp, and a frozen JSON snapshot of the change record at transition time.
The state diagram
draft ──submit──► review ──approve──► approved ──schedule──► scheduled
│ │
│ start
│ ▼
└─cancel──► cancelled in_progress
│
┌─complete──► completed
│
├─fail──────► failed
│
└─rollback──► rolled_back
Reject from review sends back to draft (with the approver's comment attached). Failing during in_progress can transition to either failed (no rollback executed) or rolled_back (rollback plan was executed). Cancel is available from any pre-in_progress state.
Common workflows
draft → review (submit)
Trigger — Author or admin clicks "Submit for review."
Guards:
- Title, description, planned start, planned end are non-empty.
- Rollback plan is non-empty when
risk in {medium, high}. - Affected assets list is non-empty when
type = emergency.
Side effects:
- The approval policy is matched against the change (risk, type, affected services). For each matched rule,
ApprovalSteprows are created with statuspending. - Email + in-app notifications fire to each pending approver.
- Event:
change.submitted_for_review.
review → approved (approve / reject)
Trigger — Each approver clicks Approve or Reject in their queue.
Guards:
- On approve, the actor's user is listed in a
pendingApprovalStep. - The change is in state
review(no race).
Side effects:
- The approval step is marked
approved(orrejected) with the comment. - If
rejected, the change moves back todraftand the author is notified. Other pending approvals are cancelled. - If all required approvals are now in, the change moves to
approvedand eventchange.approvedfires.
approved → scheduled (schedule)
Trigger — Author clicks "Schedule" with the confirmed planned window.
Guards (T0.4 — freeze enforcement):
- No active freeze window matches the planned start/end AND the affected assets.
- No conflicting change is already
scheduledorin_progressover the same window touching overlapping assets (warning only onnormal/low, hard block onemergency).
Side effects:
change.scheduledevent fires.- The change appears on the change calendar in the workspace.
- The CAB is auto-notified if the approval policy required a CAB review.
scheduled → in_progress (start)
Trigger — Author clicks "Start" (manually) or the scheduled-start Celery sweep fires.
Guards (T1.2 — task gate):
- Every task in the change's "implementation tasks" list is in state
done. - The current time is within
planned_start - toleranceandplanned_end + tolerance(configurable, default 1 hour each side).
Side effects:
change.startedevent fires.- The change card flips colour on dashboards; CAB calendar shows live status.
in_progress → completed / failed / rolled_back
Trigger — Author clicks "Complete," "Mark failed," or "Roll back."
Guards:
- The change is in state
in_progress.
Side effects:
- For
rolled_back(T1.3), the rollback plan text is copy-snapshotted into the audit row, anIncidentis auto-created (linked) ifrisk = high, and eventchange.rolled_backfires. - For
completed, eventchange.completedfires. Playbooks matching that event run — typically "close associated tasks," "post to status page," "notify stakeholders." - For
failed, eventchange.failedfires. An incident is auto-created and linked.
any pre-in_progress → cancelled
Trigger — Author or admin clicks "Cancel."
Guards:
- The change is in
draft,review,approved, orscheduled.
Side effects:
- Pending approvals are cancelled.
- Event
change.cancelledfires.
Permissions & gating
| Transition | Roles allowed |
|---|---|
| submit (draft→review) | Author, admin, owner |
| approve / reject | approver, admin, owner (matched by policy) |
| schedule (approved→scheduled) | Author, admin, owner |
| start (scheduled→in_progress) | Author, admin, owner |
| complete / fail / rollback | Author, admin, owner |
| cancel | Author, admin, owner |
| override freeze window | owner (loud audit entry) |
Troubleshooting
- "transition_not_allowed: state=draft, target=approved." — You tried to skip
review. Submit first. - "freeze_violation: window=black-friday-2026." — The planned window overlaps a freeze. Reschedule, or have an owner override.
- "task_gate_blocked: 2 tasks not done." — Open the change's Tasks panel and finish the listed tasks.
- "approval_step_already_approved." — Two approvers clicked at once. The second one no-ops; the change advanced on the first click.
- "Event didn't fire after a transition." — Check Celery beat / worker logs. The transition itself succeeded (it's atomic) but the post-commit event hook may have failed; events also write to a durable outbox table and re-fire on the next sweep.