Administration

Notifications: inbox, preferences, channels

Vigilo's notification system has three moving parts that every operator interacts with:

Last updated

Overview

Vigilo's notification system has three moving parts that every operator interacts with:

  • The bell in the topbar — a per-user inbox of recent events the user cares about, with a red unread badge.
  • The preferences matrix under Settings → General → Notifications — a per-user grid of which events to receive on which channel (app inbox, email, Slack, SMS, webhook), with a severity floor and a quiet-hours bypass flag per row.
  • Quiet hours under Settings → General → Quiet hours — a per-user "do not disturb" window that suppresses everything except rows explicitly marked "bypass quiet hours".

The system is driven by a single backend pipeline: when any event fires (a change is approved, a cert crosses an expiry threshold, an incident is opened), the platform fans out to every workspace member whose preferences match, and routes each delivery through the channels they've selected.

Why it exists

Operators get drowned in tooling notifications. Vigilo's matrix puts the user in control of what they receive and where — but with a workspace-level guarantee that critical events (P1 incidents, cert expiries) can opt to bypass quiet hours so the page still reaches the on-call person at 3 AM. The single-fire-per-threshold cooldown stops cert alerts from re-spamming every hour while a cert sits in its 30-day expiry window.

Key concepts

The event catalogue

Every routable event is registered in a central catalogue (apps/notifications/event_catalog.py). Each entry declares the event name (e.g. cert.expiring_soon), a human-readable label, a default severity, and a list of default channels seeded for every new user. Events outside the catalogue still fire webhooks + playbooks but won't appear in user preferences — adding a row to the catalogue is what makes an event addressable in the matrix UI.

The catalogue covers Changes (created / approved / rejected / scheduled / completed), Incidents (created / acknowledged / mitigated / resolved), Approvals (requested / SLA warning / SLA breached), Certificates (expiring_soon / expired / renewed), and Membership (added / removed / role_changed).

Channels

  • app — writes a row to the in-app inbox. Surfaced by the bell.
  • email — queues an async Celery task that sends a localised email through the workspace SMTP (or the platform default).
  • slack / sms / webhook — defined in the matrix but rely on the workspace having a configured Slack / SMS / webhook integration to actually deliver. Without an integration, these fall back to a log-only "delivery attempted" record so the operator sees the rule fired but no message went out.

Webhooks have a separate, workspace-level subscription mechanism (see Webhooks) that's independent of the per-user matrix — workspace webhooks fire on every dispatched event regardless of any user's preferences.

Per-rule severity floor

Each matrix row carries a severity_min (P4 / P3 / P2 / P1 or null). A rule with severity_min = P2 only fires for events whose severity is P2 or P1. Useful for incident rows where the user wants to be paged for high-sev only.

Quiet hours + Bypass

When a user is inside their quiet-hours window or has an active DND snooze, the routing engine suppresses every rule for that user — unless the rule has quiet_hours_bypass = true. The matrix UI exposes this per row, so a user can mark "incident P1 → SMS" with bypass and leave everything else under quiet hours.

P1 / critical events have an additional platform-level escape hatch: QUIET_HOURS_BYPASS_SEVERITY = "critical" means the platform overrides quiet hours for critical events regardless of the per-rule flag. This is a deliberate operational decision; flipping it requires updating on-call runbooks.

Per-rule cooldown

Each matrix row has a cooldown_minutes (default 0). When set, the routing engine refuses to fire the same rule again within the cooldown window — even if the underlying event fires multiple times. This is the reason cert-expiry alerts don't re-spam every hourly scan: the row records last_sent_at on each successful delivery, and subsequent attempts inside the window are suppressed with suppressed_reason = "cooldown" and surface as a deduplication line in the audit log.

For event types that fire continuously (cert sitting in the 30-day window, SLO burn-rate inside the alerting threshold), set a sensible cooldown (24 h is typical) on the row to prevent fatigue without losing the initial alert.

Common workflows

1. Read the inbox

  1. Click the bell icon in the top bar. The badge shows the unread count, capped at "9+".
  2. The popover lists the 25 most recent rows. Unread rows are highlighted with a left dot + soft accent background; severity-tinted text (P1 red, P2 orange, P3 amber, P4 sky).
  3. Click any row to mark it read and navigate to the entity (incident, change, cert detail page) the notification refers to.
  4. Mark all read in the popover header clears every unread row in one shot.

2. Tune your preferences matrix

  1. Open Settings → General → Notifications.
  2. The grid lists every catalogued event in rows, with one checkbox column per channel (App / Email / Slack / SMS / Webhook) and two trailing columns: Sev floor (dropdown: Any / P4+ / P3+ / P2+ / P1) and Bypass QH (checkbox).
  3. Toggle channels on / off per event. The change persists optimistically — the cell ticks immediately, then reconciles against the server.
  4. Set Sev floor + Bypass QH per row (not per channel). They apply to every channel on that row.

3. Set quiet hours

  1. Open Settings → General → Quiet hours.
  2. Set start + end times, plus a timezone (uses your profile timezone by default).
  3. Optional: Snooze all for 1h / 4h / until tomorrow 9 AM / custom datetime to quickly mute the platform without changing the recurring window.
  4. While the window is active, every routing rule without Bypass QH is suppressed for you.

4. Diagnose a missing notification

  1. Open the audit log (Governance → Audit).
  2. Filter by action = "notification_routed" or by the entity in question.
  3. Each row shows the rules matched, which were delivered, and which were suppressed with the reason (quiet_hours, cooldown, no_sender).
  4. Common causes:
    • The rule's channel sender isn't registered (Slack/SMS without an integration → log-only).
    • The rule is in cooldown — the last_sent_at timestamp tells you when the cooldown will lift.
    • The user is inside quiet hours and the rule doesn't bypass.
    • The event isn't in the catalogue — fix by adding it to event_catalog.py.

Permissions

Action Required permission
Read your own inbox Authenticated workspace member
Edit your own preferences Authenticated workspace member
Configure workspace-level SMTP, Slack, etc. Admin / Owner
Add events to the catalogue Platform admin (code change)

Troubleshooting

"I toggled Bypass QH but it doesn't seem to persist." — Recent fix in the matrix UI: previously the toggle was a no-op if no channel was enabled on that row. The current build applies the bypass flag to every existing rule (and creates a placeholder rule if none exist) so the flag sticks.

"Cert alerts only fire on day 30, not on day 14 / 7 / 1." — Check the cert's notify_days_before (Edit Certificate dialog). Each threshold fires exactly once per cert lifecycle; the memo resets on renewal. If you see no alert at all, check the per-rule cooldown on the matching matrix row — a 24h cooldown spanning the day-of-week boundary can swallow the next threshold's alert.

"My inbox shows old rows but no new ones." — The bell polls /notifications/inbox/unread-count/ every 30 seconds. Hard-refresh the page to drop a stale cache. If the issue persists, check the browser console for WebSocket errors — live push happens over Channels.

Related

  • Settings overview — where the matrix + quiet hours surfaces live
  • SMTP configuration — the relay that delivers email-channel notifications
  • Webhooks — workspace-level outbound delivery (independent of user preferences)
  • Integrations — Slack / Teams / PagerDuty / OpsGenie senders for the matching channels