Core concepts

Change FSM in depth

Every change in Vigilo lives on a finite state machine. The states aren't just a UI hint — they're enforced server-side. The Django…

Last updated

Overview

Every change in Vigilo lives on a finite state machine. The states aren't just a UI hint — they're enforced server-side. The Django change_request.transition_to(next_state, actor) call is the only way to change state. It runs the guard conditions for that transition, writes an audit row, fires events through dispatch_event, and rolls back atomically if any guard fails.

This article walks through every state, every transition, the guard each enforces, and the events each emits. If you're integrating with the change API or building an automation around it, read this carefully.

Why it exists

A change process that lives in code is one that fires the same way every time, can be audited mechanically, and survives staff turnover. The point of formalising it as an FSM (rather than a tangle of if status == checks) is that the transition table IS the policy — when somebody asks "can we skip approval on standard low-risk changes?" the answer is a row in the table, not a code-base safari.

Key concepts

  • State — One of draft, review, approved, scheduled, in_progress, completed, failed, rolled_back, cancelled.
  • Transition — A directed edge between two states, plus a guard function and an event name. Defined in apps/changes/fsm.py.
  • Guard — A predicate that must return true for the transition to fire. Examples: "all required approvals are in," "no freeze window blocks the planned start," "all task-gate tasks are done."
  • Event — Fired via dispatch_event(<name>, payload) after the transition succeeds. Names follow change.<verb> (e.g. change.approved, change.started, change.completed).
  • Audit row — Written before the event fires, captures actor, from_state, to_state, comment, timestamp, and a frozen JSON snapshot of the change record at transition time.

The state diagram

draft ──submit──► review ──approve──► approved ──schedule──► scheduled
  │                                                            │
  │                                                          start
  │                                                            ▼
  └─cancel──► cancelled                                   in_progress
                                                              │
                                                ┌─complete──► completed
                                                │
                                                ├─fail──────► failed
                                                │
                                                └─rollback──► rolled_back

Reject from review sends back to draft (with the approver's comment attached). Failing during in_progress can transition to either failed (no rollback executed) or rolled_back (rollback plan was executed). Cancel is available from any pre-in_progress state.

Common workflows

draft → review (submit)

Trigger — Author or admin clicks "Submit for review."

Guards:

  • Title, description, planned start, planned end are non-empty.
  • Rollback plan is non-empty when risk in {medium, high}.
  • Affected assets list is non-empty when type = emergency.

Side effects:

  • The approval policy is matched against the change (risk, type, affected services). For each matched rule, ApprovalStep rows are created with status pending.
  • Email + in-app notifications fire to each pending approver.
  • Event: change.submitted_for_review.

review → approved (approve / reject)

Trigger — Each approver clicks Approve or Reject in their queue.

Guards:

  • On approve, the actor's user is listed in a pending ApprovalStep.
  • The change is in state review (no race).

Side effects:

  • The approval step is marked approved (or rejected) with the comment.
  • If rejected, the change moves back to draft and the author is notified. Other pending approvals are cancelled.
  • If all required approvals are now in, the change moves to approved and event change.approved fires.

approved → scheduled (schedule)

Trigger — Author clicks "Schedule" with the confirmed planned window.

Guards (T0.4 — freeze enforcement):

  • No active freeze window matches the planned start/end AND the affected assets.
  • No conflicting change is already scheduled or in_progress over the same window touching overlapping assets (warning only on normal/low, hard block on emergency).

Side effects:

  • change.scheduled event fires.
  • The change appears on the change calendar in the workspace.
  • The CAB is auto-notified if the approval policy required a CAB review.

scheduled → in_progress (start)

Trigger — Author clicks "Start" (manually) or the scheduled-start Celery sweep fires.

Guards (T1.2 — task gate):

  • Every task in the change's "implementation tasks" list is in state done.
  • The current time is within planned_start - tolerance and planned_end + tolerance (configurable, default 1 hour each side).

Side effects:

  • change.started event fires.
  • The change card flips colour on dashboards; CAB calendar shows live status.

in_progress → completed / failed / rolled_back

Trigger — Author clicks "Complete," "Mark failed," or "Roll back."

Guards:

  • The change is in state in_progress.

Side effects:

  • For rolled_back (T1.3), the rollback plan text is copy-snapshotted into the audit row, an Incident is auto-created (linked) if risk = high, and event change.rolled_back fires.
  • For completed, event change.completed fires. Playbooks matching that event run — typically "close associated tasks," "post to status page," "notify stakeholders."
  • For failed, event change.failed fires. An incident is auto-created and linked.

any pre-in_progress → cancelled

Trigger — Author or admin clicks "Cancel."

Guards:

  • The change is in draft, review, approved, or scheduled.

Side effects:

  • Pending approvals are cancelled.
  • Event change.cancelled fires.

Permissions & gating

Transition Roles allowed
submit (draft→review) Author, admin, owner
approve / reject approver, admin, owner (matched by policy)
schedule (approved→scheduled) Author, admin, owner
start (scheduled→in_progress) Author, admin, owner
complete / fail / rollback Author, admin, owner
cancel Author, admin, owner
override freeze window owner (loud audit entry)

Troubleshooting

  • "transition_not_allowed: state=draft, target=approved." — You tried to skip review. Submit first.
  • "freeze_violation: window=black-friday-2026." — The planned window overlaps a freeze. Reschedule, or have an owner override.
  • "task_gate_blocked: 2 tasks not done." — Open the change's Tasks panel and finish the listed tasks.
  • "approval_step_already_approved." — Two approvers clicked at once. The second one no-ops; the change advanced on the first click.
  • "Event didn't fire after a transition." — Check Celery beat / worker logs. The transition itself succeeded (it's atomic) but the post-commit event hook may have failed; events also write to a durable outbox table and re-fire on the next sweep.

Related articles