Alert rules turn raw scan results into notifications. They are the single configuration surface that decides who hears about an expiring certificate, a failed scan, or an issuer change — and how often.
Where to find them. Alert rules no longer have a dedicated sidebar entry. Each module owns its own rules now — open Certificates → Bell icon in the page header to manage certificate-scoped rules, or Monitoring → Bell icon for host/uptime-scoped rules. Both icons deep-link to the same Alert Rules page with a pre-filtered scope.
Overview
An AlertRule is a workspace-scoped row that links a trigger (a condition observable on monitoring data) to one or more channels (email recipients and webhook endpoints), constrained by an alert scope (the whole workspace or a specific list of hosts) and a cooldown (a minimum interval between fires). Each rule also carries an owner module — certificate or monitoring — that determines which page's Bell icon surfaces it.
Rules are evaluated by the FastAPI dispatcher (alert_dispatch.py) at the end of every probe, every renewal run, and every periodic sweep. The dispatcher walks active rules for the workspace, asks each one whether it matches the latest snapshot, and — if it does — calls the channel layer to send the message.
Why it exists
Hard-coded notifications never fit every team. Some operators want a single weekly digest, others want a Slack ping at 30, 14 and 1 days before expiry plus a PagerDuty call for expired. Alert rules give you a small, composable building block so the on-call rotation can tune signal/noise without involving engineering.
Key concepts
- AlertRule fields —
name,trigger_on,days_threshold(only used by certain triggers),scope(the owner module:certificateormonitoring— drives which Bell icon surfaces the rule),alert_scope(target breadth:workspaceorhost),scope_hosts[](UUIDs when alert_scope ishost),recipients[](list of email addresses + webhook IDs),cooldown_minutes(default 1440),last_fired_at(auto-managed),is_active. - Triggers — Vigilo ships these built-in triggers:
expiring_soon— fires whendays_until_expiry <= days_thresholdfor any in-scope host.expired— fires whendays_until_expiry < 0(a fresh fire after each renewal attempt).scan_failed— fires when a probe raises an error two consecutive times.renewal_failed— fires whenCertificateRenewalPolicyrun ends infailedorneeds_manual.cert_issuer_changed— fires when consecutive snapshots show a different issuer DN (WA.15).cert_anomaly— fires when WB.27 detects a change in protocol, key, cipher or SAN.
- Channels — email is the default. Webhook channels reuse the
Webhookrows from the Integrations app, so an existing Slack or PagerDuty hook can be selected from a dropdown rather than re-entered. - Cooldown semantics — the dispatcher records
last_fired_atper(rule, host)pair. A rule will not re-fire for the same host inside the cooldown window even if the condition stays true. Cooldown resets the moment the condition flips back to false. - Rate limiting — on top of cooldown, the dispatcher caps email volume at 60 messages per minute per workspace to protect the SMTP relay. Excess events are coalesced into a single digest queued for the next minute.
- Workspace SMTP fallback — if the workspace has not configured outbound SMTP under Admin → SMTP, alerts fall through to the platform default sender. The rule will still fire; only the From: address differs.
Common workflows
1. Create an alert rule for 30/14/1-day expiry
- Open Certificates and click the Bell icon in the page header. (Or, for host-uptime rules, Monitoring → Bell.) The Alert Rules page opens pre-filtered to that module's scope, so the rule you create is automatically tagged
scope=certificate(ormonitoring). - Click + New rule.
- Name:
Cert expiry — production. - Trigger:
expiring_soon. Days threshold: pick a single integer here (you can create three rules for 30/14/1, or rely on per-hostnotify_days_beforeand a single threshold of 30). - Alert scope:
workspace(orhostwith a host-picker if you only care about a subset). - Recipients: type email addresses (validated inline) and pick webhook channels from the dropdown.
- Cooldown: leave at 1440 minutes (24 hours) so the same host does not re-fire daily within the same threshold band.
- Save. The rule becomes active immediately and is surfaced under the same Bell icon you opened it from.
2. Wire a rule to a Slack channel via webhook
- Open Integrations → Webhooks and add an Incoming Webhook with the Slack URL. Test it once using the Send test button — it appears as
WebhookDeliveryrow with statusdeliveredon success. - Back in Alert rules → + New rule, in the Recipients field, pick the webhook from the dropdown.
- When the rule fires, the dispatcher posts a JSON payload (
alert_rule_id,trigger,host,snapshot_id,days_until_expiry) to the webhook target. Slack formats it via the workspace incoming-webhook templating.
3. Send a test alert without waiting for a real event
- Open the rule's detail page.
- Click Send test (E3).
- The dispatcher invokes the rule's channel layer with a synthetic snapshot describing a fake host. Recipients receive a clearly-labelled
[TEST]message. - The result panel shows per-channel status — delivered, retrying, failed — so you can debug a misconfigured webhook before a real event hits.
4. Temporarily disable a noisy rule
- Open the rule detail.
- Toggle Active off, or extend Cooldown to a larger value (e.g. 10080 minutes = 7 days).
- Disabled rules remain visible in the list with a muted style.
5. Audit which rule fired and when
- From either Certificates → Bell or Monitoring → Bell, switch the Alert Rules page to the Activity tab. The tab respects the current
?scope=filter, so you only see fires from the module you're investigating. - The activity table lists every fire across every rule with
fired_at,trigger,host,recipient, and the resultingWebhookDeliveryorEmailLoglink. - Filter by date, rule or host. Drop the
?scope=query string to see fires across both modules. Export to CSV for incident retrospectives.
Permissions
| Action | Roles |
|---|---|
| View rules | All workspace members |
| Create or edit rule | Operator, Admin, Owner |
| Delete rule | Admin, Owner |
| Send test | Operator, Admin, Owner |
| Configure workspace SMTP | Admin, Owner (link from the rule editor) |
| View activity log | All authenticated workspace members |
Rule endpoints inherit WorkspaceScopedMixin. A rule from workspace A is never visible to workspace B.
Troubleshooting
Rule looks correct but never fires.
Check three things in order. First, is_active on the rule. Second, alert_scope — a rule scoped to a host must list that host's UUID in scope_hosts. Third, last_fired_at — the cooldown may have suppressed the fire. The Activity table shows suppressed evaluations as cooldown rows.
I can't find a rule I know I created.
Check the Bell icon you opened the Alert Rules page from. A rule created via the Certificates Bell carries scope=certificate and is hidden when you open the page via the Monitoring Bell (and vice versa). To list every rule regardless of owner module, navigate directly to /monitoring/alerts without the ?scope= query parameter.
Webhook delivery shows failed for hours.
Webhook deliveries retry with exponential backoff up to 5 attempts (1 min, 5 min, 30 min, 2 h, 6 h). After that the delivery is dead-lettered. Open the delivery row to see the response status and body, then fix the receiver and click Retry.
Emails arrive from noreply@vigilo.dev instead of our own domain.
The workspace SMTP profile has not been saved. Visit Admin → SMTP and configure host, port, credentials, From: address. After saving, future fires use that profile.
An expiring-soon rule fires once at 30 days then nothing at 14 or 7.
Either you only configured one days_threshold, or cooldown is too long. The cleanest model is one rule per threshold band; alternatively, set notify_days_before on each host to [30, 14, 7, 1] and have the rule fire on transitions through any of those points.
scan_failed fires immediately on a brand new host.
A single failed probe does not fire the rule — two consecutive failures must occur. If you see otherwise, check whether the host already had failure history before you re-added it.
Related articles
- Host monitoring — the data source for every trigger except SLO burn alerts.
- Service Level Objectives — for availability and latency alerts, not certificate alerts.
- Status page — auto-publish a public note when a rule fires above a severity threshold.