Overview
A "certificate" in Vigilo is really a monitored host + port pair. You give Vigilo a hostname (and optionally a non-standard port), and a Celery beat job scans it on an interval, captures everything about the TLS handshake — issuer, subject, SAN list, expiry, signature algorithm, key size, cipher suites, OCSP status — and stores each scan as a CertificateSnapshot row. You can then layer alert rules on top: "warn me 30 days before expiry," "page me on chain validation failure," "ticket me if the TLS report card drops below B."
Adding a host takes about thirty seconds. The first scan runs immediately so you see data right away; subsequent scans run on the cadence you pick.
Why it exists
Most certificate "monitoring" is one of: a cron job that emails the team when an expiry is close, a spreadsheet, or a vendor SaaS that charges per cert. The cron job and the spreadsheet rot quickly; the SaaS doesn't know about your changes, your incidents, or your audit log. Vigilo's cert monitor is in the same workspace as the change request that'll rotate the cert, the incident if it fails to rotate, the asset record for the underlying host, and the audit log that the auditor wants to see — one query instead of four.
Key concepts
- MonitoredHost — A hostname + port + workspace row. Drives the scan schedule. Active or paused.
- CertificateSnapshot — One row per scan. Immutable. Stored in the FastAPI Postgres DB (separate from the Django main DB) so the high-frequency writes don't compete with ITSM workloads.
- Scan interval — How often the Celery beat job re-scans. Choices: 5 min, 15 min, 1 hour, 6 hours, 24 hours. Lower intervals consume more plan quota.
- Alert rule — A predicate over snapshots that, when it transitions from false to true, raises an alert. Alerts dispatch via the same event bus as changes and incidents (so you can pipe them to Slack, PagerDuty, or a webhook).
- TLS report card — A graded summary (A+ → F) of the host's TLS posture: protocol versions, cipher suites, key strength, OCSP stapling, chain completeness. Derived from each snapshot.
- Snooze — Suppress alerts for a host (or a single alert rule) for a fixed window. Auto-expires; logs to the audit trail.
Common workflows
1. Add a host
Sidebar → Certificates → Add host. Fill in:
- Hostname — e.g.
api.acme.com. No scheme, no path. - Port — defaults to 443 (HTTPS). Other built-ins: 465 (SMTPS), 993 (IMAPS), 995 (POP3S), 587 (STARTTLS submission), 636 (LDAPS), or custom for anything else. For STARTTLS ports Vigilo issues the STARTTLS upgrade before grabbing the cert; you don't need to configure the protocol explicitly.
- Scan interval — start with 1 hour. You can dial it down later.
- Tags — free-form labels for grouping in lists and filters (e.g.
prod,payments,external-facing).
Click Add. The first scan kicks off immediately and you'll see the snapshot land within ~5 seconds.
2. Inspect a snapshot
Click any host row → the right-side detail panel opens with:
- Validity — Not Before / Not After, days remaining (colour-coded: green > 30d, amber 7-30d, red < 7d, dark red if expired).
- Chain — Subject, issuer, SAN list, full chain with intermediates.
- Crypto — Public key algorithm + size, signature algorithm, OCSP stapling presence.
- Posture — TLS protocol versions accepted, cipher suite list, HSTS header, the computed report-card grade.
- History — Sparkline of report-card grades over the last 30 days. Useful for spotting regressions after a server change.
3. Create an alert rule
Sidebar → Monitoring → Alerts → Create rule. Pick:
- Trigger — e.g. "expires in less than" + 30 days, "report card grade below" + B, "chain validation fails," "OCSP stapling stops."
- Scope — All hosts, a tag (
prod), or a specific host. - Severity — info / warning / critical. Drives the colour and the routing.
- Channels — In-app only, or wire to an integration (Slack, webhook, PagerDuty).
Save. The rule evaluates on every snapshot. If the condition flips true, an alert opens and the relevant event fires. The alert closes automatically when the condition clears.
4. Snooze a noisy host
From the host detail panel → Snooze → choose a window (1h, 1d, 1w, custom). All alert rules for that host suppress for the window. The snooze is recorded in the audit log and auto-expires.
5. Pause a host
If a host is being decommissioned, you don't want alerts but you also don't want to lose the history. Detail panel → Pause monitoring. The host stays in the list (greyed) and you can resume any time.
Permissions & gating
| Action | Roles allowed |
|---|---|
| View hosts/snapshots | All workspace members |
| Add a host | member, approver, admin, owner |
| Edit a host | member, approver, admin, owner |
| Pause/resume | member, approver, admin, owner |
| Snooze alerts | member, approver, admin, owner |
| Create alert rule | admin, owner |
| Delete a host | admin, owner |
Troubleshooting
- "Scan stuck on 'Pending' forever." — Celery worker isn't picking up the queue. Check
vigilo-celeryis running; in dev that'sstart-dev.ps1. - "Handshake failed: hostname mismatch." — Common when an internal hostname is monitored from outside the perimeter. Either move the worker inside the network, or monitor by the externally-resolvable name.
- "Snapshot shows 'TLS_AGENT' as issuer." — There's a TLS-terminating proxy in front of the host. You're seeing the proxy's certificate, not the origin's. Monitor the origin directly, or live with it.
- "Report card is grade F but nothing's wrong." — Check the Posture panel — usually it's TLS 1.0/1.1 still enabled, or RSA-1024 in the chain. Both downgrade automatically.
- "I get duplicate alerts every interval." — The rule didn't close; the condition is still true. Check the rule definition; consider snoozing while you fix the underlying issue.