Overview
Vigilo distinguishes two superficially-similar concepts that solve different problems:
- A FreezeWindow blocks change requests from being submitted or scheduled during its interval.
- A MaintenanceWindow suppresses monitoring alerts (and optionally auto-approves standard changes) during its interval.
The same week of the year might be both — a holiday code freeze plus a planned datacenter maintenance — but the two are configured separately because they protect against different failure modes. Freeze windows protect against humans deploying when they shouldn't; maintenance windows protect against humans being paged when the system is intentionally noisy.
Both live under Settings → Maintenance at /ws/{slug}/settings/maintenance and ship with their own forms, calendar overlay, and audit history.
Why it exists
Every operations team has a Black Friday, a year-end close, a tax-filing deadline. Trying to enforce "no changes this week" by emailing the engineering list is how outages happen. The FreezeWindow model encodes the policy in the FSM so the system, not a human, says no — and records the override when leadership says yes.
Maintenance windows solve the inverse: when you've taken a service down deliberately, you don't want to wake the on-call. Suppressing alerts for the planned interval (and only that interval) keeps the monitoring signal honest.
Key concepts
FreezeWindow
- start_time / end_time — The closed interval
[start, end]during which submission and scheduling are blocked. - allow_override — When
True, admins and owners can push a change through with a justification. WhenFalse, this is a hard lockout — no one can override (typical for P0 incident lockouts). - applies_to_priorities — Optional list. Empty means "applies to all changes"; populated means "only blocks changes whose
priorityis in this list". A common pattern is to freeze["low", "medium"]during a release week so emergency hotfixes still flow. - applies_to_change_types — Optional list of change types the freeze applies to. Empty = all types. Letting
emergencythrough is a common configuration. - reason — Free-text shown in the block error and the audit log. Required by convention.
- is_active — Soft toggle. An inactive freeze never blocks regardless of dates.
MaintenanceWindow
- start_time / end_time — The interval during which monitoring alerts are suppressed.
- affected_hosts — List of host UUIDs the window applies to. Empty means all hosts in the workspace.
- suppress_ssl_alerts — When
True(default), SSL/certificate alerts emitted foraffected_hostsduring the window are dropped. - auto_approve_standard — When
True, standard changes whose planned window falls inside this maintenance window auto-approve on submit. - recurrence — One of
none,weekly,monthly. Recurring windows are evaluated bycovers(when)against the current period.
FSM enforcement
Two transitions on ChangeRequest call assert_not_frozen from apps.changes.services:
- submit() — Only checks when
planned_startandplanned_endare set. Engineers can still draft titles and descriptions during a freeze; they just can't lock a window that lands inside one. - schedule() — Always checks. The change has been through review and now needs a concrete slot, so we always run the guard.
When a freeze is hit, the FSM raises a DRF ValidationError with code freeze_window_blocked and a blockers array listing each overlapping window's id, title, start_time, end_time, and allow_override flag.
Common workflows
Declaring a freeze window
- Go to Settings → Maintenance → Freeze windows → New.
- Fill in title, reason, start time, end time. Pick scope: leave
applies_to_prioritiesandapplies_to_change_typesblank for a total freeze, or narrow it. - Choose
allow_override. The defaultTruelets admins override with justification; flip toFalsefor a hard lockout. - Save. Active freezes show as a red banner on the changes list during their interval and a pink overlay on the change calendar.
Overriding a freeze
- From a blocked change, click Override freeze. The action is only visible to users with the
freeze_windows.overridepermission (admin / owner). - The override dialog requires a justification of at least 20 characters. This text is recorded in the audit log alongside the user, timestamp, and the IDs of the freezes overridden.
- The change's
submit()orschedule()is re-run withoverride=True. If every overlapping freeze hasallow_override=True, the transition completes. If any freeze hasallow_override=False, the override is refused regardless of caller permission.
Declaring a maintenance window
- Go to Settings → Maintenance → Maintenance windows → New.
- Pick start, end, and recurrence. For a one-off, leave recurrence as
none. - Add
affected_hostsfrom the asset picker. Leave empty to apply to all hosts. - Choose
suppress_ssl_alerts(default on) andauto_approve_standard(default off). - Save. The window appears on the monitoring dashboard with a muted-alerts badge.
Permissions
- Viewers can see active freezes and maintenance windows but cannot edit.
- Engineers can read the policy and submit changes that respect it.
- Approvers / admins / owners can create, edit, and delete windows.
- Only admins and owners can override a freeze, and only when the freeze allows it.
Troubleshooting
freeze_window_blocked on submit — Your planned window overlaps an active freeze. The error payload lists each blocker. Either move the change window outside the freeze, or ask an admin to override (only possible if every blocker has allow_override=True).
Override refused with "hard freeze" — One of the overlapping freezes has allow_override=False. There is no recourse short of an admin deactivating the freeze record, which is itself an audit-worthy action.
Standard changes still need approval inside maintenance — auto_approve_standard only applies to changes flagged is_standard=True whose entire planned window falls inside the maintenance window. Partial overlaps don't qualify.
SSL alerts firing during planned maintenance — Confirm the host UUID is in affected_hosts, the window is is_active=True, and the current time falls inside [start_time, end_time]. Recurring windows use the current period's projection, not the original start_time literally.