Vigilo continuously watches every TLS certificate served by hosts you have added to a workspace. The Certificates page is the operational nerve centre for keeping public-facing endpoints, internal services and partner integrations from breaking when a cert silently expires.
Overview
Certificate data is sourced from two systems that the page merges into a single list:
- Django
Certificaterows — the canonical "this is a cert we care about" entity. Created manually (Add Certificate) or auto-promoted from a discovered SAN/CT log. Each row has asourceof eithermanualornetworkand is the long-lived identity for renewal policies, ownership, and audit linkage. - FastAPI
CertificateSnapshotrows — immutable point-in-time observations from the scanner, one per(host, port, leaf_fingerprint)triple captured on every probe. These are the live signal feedingdays_until_expiry, chain validation, grade calculation, and anomaly detection.
The Certificates page (/ws/{slug}/certificates) merges both: each Django row is enriched with the most recent matching snapshot so you see the canonical record plus the latest observation in one row. Snapshots without a matching Django row (e.g. a freshly discovered host) appear as source=network rows you can either adopt (promote to a Django Certificate) or ignore.
A snapshot stores the leaf certificate plus the full chain, the SAN list, signature algorithm, key type and size, negotiated protocol/cipher, the issuer DN, OCSP and CRL pointers, the parsed not_before / not_after, and a derived days_until_expiry value recomputed on every render. The row also carries tls_grade (A+, A, B, C, F) and status (healthy, expiring, expired, error).
The page header carries a Bell icon that opens the Alert Rules page pre-filtered to scope=certificate — every certificate-related rule (expiring_soon, expired, cert_anomaly, cert_issuer_changed) lives under this Bell, separate from the host/uptime rules under the Monitoring page's Bell.
Why it exists
Most cert-related outages are not technical — they are calendar problems. Somebody forgot to renew, the renewal pipeline broke six weeks ago, or a SAN was dropped during a re-key. Vigilo collapses every certificate across every workspace into one ranked queue so the right person fixes the right cert before users notice. The same data feeds the SLO engine, the status page, the dependency map and the cert-renewal Kanban.
Key concepts
- CertificateSnapshot — immutable point-in-time observation. Old snapshots are never deleted, which lets you diff renewals and trigger anomaly alerts (
cert_anomaly) when issuer, protocol, key, cipher or SAN changes between two consecutive snapshots (WB.27). - days_until_expiry — derived on the fly from
not_after - now(). Negative values mean the cert is already expired. This is the value alert rules compare againstdays_threshold. - notify_days_before — list of day-thresholds at which the scanner fires an expiry alert (default
[30, 14, 7, 1], editable per certificate via the Add / Edit dialogs). Each threshold fires once per cert lifecycle — a cert sitting inside the 30-day window for a month doesn't re-spam every hourly scan. The "already fired" memo is reset automatically whenvalid_untilshifts forward (i.e. when the cert is actually renewed), so the new cycle alerts again. - alerted_thresholds — the row's memo of which thresholds have already fired since the last renewal. Visible as amber chips in the Detail panel's "Expiry notifications" section (grey = pending, amber = already alerted this cycle).
- status —
valid(more than 30 days to expiry),expiring_soon(≤ 30 days),expired(pastnot_after),error(scan failed — error message names the exception class so failures are triageable from the UI),unknown(cert in inventory but not yet scanned). cert.expired/cert.expiring_soon/cert.renewedevents — dispatched per threshold crossing and on renewal detection. Webhooks and per-user notification rules subscribe to these.- TLS grade — Vigilo runs its own grader that combines protocol versions, cipher suites, key size, chain trust and forward secrecy into a single letter. Anything below B raises a warning badge on the row.
- SAN expansion (T1.5) — when a single cert covers many hosts via SAN entries or a
*.example.comwildcard, Vigilo expands each SAN into a virtual host row in the monitoring list, so the Hosts page can show coverage even for hosts you never added by hand. - CT alerts — the scanner also subscribes to Certificate Transparency logs for every domain represented in your hosts table. A new cert minted by an issuer you do not recognise raises a
cert_anomalyalert, which is useful for catching unauthorised issuance.
Common workflows
1. Add a certificate manually
- Open Reliability → Certificates and click the + icon in the header.
- Fill the Add Certificate dialog: domain (required), port (default
443), notes, owner. Vigilo immediately schedules a first scan and the row appears in the list withsource=manual. - Use this when you want to track a cert before it ships — e.g. a staging host that doesn't yet have a public DNS record. The row stays in the list even when scans fail, so you can wire alerts during a migration window.
2. Trigger an on-demand scan or remove a row
Per-row actions appear on every Django-sourced row (you'll see the ⚡ Scan now and 🗑 Delete icons in the actions column for rows where source ≠ network):
- ⚡ Scan now queues an immediate scrape for that one cert; the row updates as soon as the FastAPI scanner returns. Useful right after a renewal so you don't wait for the next periodic sweep.
- 🗑 Delete opens a confirm dialog that drops the Django row. Past snapshots are kept (so audit history survives), but the row no longer surfaces in the list and no further scans are queued.
Network-discovered rows (source=network) don't show these actions — they're owned by the scanner, not the workspace. Promote one to a managed Certificate by clicking Adopt in the row's detail drawer.
3. Find every cert that expires in the next 30 days
- Open Reliability → Certificates.
- Click the Status column header twice to sort by ascending
days_until_expiry. - Click the Expiring chip in the filter bar (it sets
?status=expiring&within=30). - The list now shows only certs that need attention. Use the heatmap (top right) or switch to Expiry forecast (the Gantt-style view from WA.17/18) to see clustering.
4. Inspect, copy and download a single certificate
- Click the host name in any row to open the Certificate detail drawer.
- The top tab Leaf shows subject, issuer, serial,
not_before/not_after, SAN list and the SHA-256 fingerprint. The Copy fingerprint button copies the fingerprint to the clipboard. - The Chain tab shows every intermediate up to the trust anchor.
- Click Download .pem (from E2) to save the leaf or the full chain. The button is gated by the
cert.downloadpermission — Viewers do not see it. - The History tab lists every snapshot Vigilo ever took for this host:port, so you can see the previous issuer or fingerprint at a glance.
5. Bulk renew or snooze a batch of expiring certs
- Tick the checkbox on every row you want to act on, or use the Select all expiring quick action in the toolbar.
- Click Bulk actions → Renew now to queue a
CertificateRenewalPolicyrun for each selected host. If a host has no policy, the action is disabled and the toolbar tells you which hosts need a policy first. - Choose Bulk actions → Snooze until… instead if a cert is intentionally being retired. Snoozing writes to
MonitoredHost.snoozed_until(from E1) and suppresses alerts without stopping scans. - Watch the Renewal Kanban (WB.28) for live status — queued, running, succeeded, failed, needs manual action.
6. Export a TLS report card for an auditor
- Open Certificates → Reports.
- Filter to the scope you need (e.g.
environment = prod,criticality >= high). - Click Export TLS report card (PDF) (WA.13). The PDF includes one page per host, grade history, current chain, and a workspace-level summary chart.
7. Investigate a cert anomaly alert
- The alert email links straight to Anomaly → cert-XYZ.
- The detail view shows the previous snapshot and the new snapshot side by side, highlighting which fields changed (issuer, protocol, key, cipher, SAN list — the WA.15 + WB.27 detectors).
- Confirm the change is legitimate, then click Acknowledge to suppress further alerts for that fingerprint, or Open incident to escalate.
Permissions
| Action | Roles |
|---|---|
| View certificate list and detail | All authenticated workspace members |
| Copy fingerprint | All authenticated members |
| Add certificate (manual entry) | Operator, Admin, Owner |
| Scan now (single row) | Operator, Admin, Owner |
| Delete certificate (Django row) | Admin, Owner |
| Download .pem | Operator, Admin, Owner |
| Bulk renew | Operator, Admin, Owner |
| Bulk snooze | Operator, Admin, Owner |
| Export TLS report card | Auditor, Admin, Owner |
| Acknowledge anomaly | Operator, Admin, Owner |
All certificate API endpoints inherit WorkspaceScopedMixin, so a row for workspace A is never returned to a session bound to workspace B even if you craft the URL manually.
Troubleshooting
A host shows status error but the website works in a browser.
Open the detail drawer — the Last scan panel shows the raw handshake error. The most common causes are SNI mismatch (the scanner asked for the hostname but the host returned a default cert), connection refused on the configured port, or a private CA the scanner does not trust. For private CAs, upload the root to Settings → Trust anchors.
days_until_expiry looks wrong by exactly one day.
The value is computed in UTC. Hosts whose not_after falls within a few hours of midnight in your timezone can show off-by-one in the UI. Hover the value to see the exact UTC timestamp.
SAN expansion is missing a hostname.
Wildcard expansion only covers a single label (*.example.com matches api.example.com but not v2.api.example.com). Add the deeper host manually if you need to monitor it.
Bulk renew button is greyed out.
Either no rows are selected, or one of the selected hosts has no CertificateRenewalPolicy. Hover the button to see the blocking host names, then add a policy from Cert renewal before retrying.
Related articles
- Host monitoring — manage the hosts that feed snapshots into this page.
- Alert rules — wire expiring/expired/anomaly triggers to email and webhook channels.
- Certificate renewal — automate the renew step instead of doing it by hand.