Administration

Webhooks (outbound and inbound)

A Webhook is the generic way to push a Vigilo event to any HTTP endpoint outside the product, and a WebhookDelivery is the per-attempt receipt of that…

Last updated

Overview

A Webhook is the generic way to push a Vigilo event to any HTTP endpoint outside the product, and a WebhookDelivery is the per-attempt receipt of that push. Together they form the audit-grade, retry-aware outbound event pipeline that integrations and customer-built automation rely on.

This article covers both directions: outbound webhooks (Vigilo → your endpoint, with HMAC-signed payloads, retries, dead-letter handling, and key rotation) and inbound webhooks (your system → Vigilo, with signature verification and trigger semantics).

For specific tool integrations (Slack, Jira, PagerDuty) see Integrations; this article is about the generic webhook surface beneath them.

Why it exists

Every operational platform eventually needs to feed events into systems we haven't built (legacy ticketing, internal Slack bots, an SRE team's dashboard). Hardcoded integrations don't scale to that long tail. Generic outbound webhooks let any admin point a Vigilo event at any URL with signed payloads and retry guarantees; generic inbound webhooks let any external system trigger a Vigilo action with verifiable signatures. The pair forms the contract between Vigilo and "everything else".

Key concepts

Webhook model (outbound)

Webhook (workspace, name, url, secret (encrypted, 64 chars), events (JSON list), is_active, headers (JSON dict), created_at, created_by).

  • events — array of event keys the webhook subscribes to (e.g. ["change.scheduled", "incident.opened", "incident.resolved"]). Wildcards supported (change.*).
  • secret — used to compute the HMAC-SHA256 signature on every delivery.
  • headers — optional custom headers (e.g. an internal X-Tenant-ID for the receiving system's routing).

A webhook is created via Settings → Webhooks → New webhook.

WebhookDelivery + retry

Each event match produces a WebhookDelivery row (webhook, event_type, payload JSON, status_code, response_body (truncated 2 KB), attempt, next_retry_at, delivered_at, error_class). The delivery worker is a Celery task with exponential backoff:

  • max_retries — default 6, configurable per workspace via Workspace.settings.webhook_max_retries.
  • backoff schedule — 30s, 2m, 8m, 30m, 2h, 8h (cumulative). Each backoff is jittered ±20% to prevent thundering herds.

A delivery is considered successful on any 2xx response. 4xx (except 429) terminates the retry chain immediately — the receiver said "your payload is wrong", no point retrying. 5xx and 429 schedule the next retry per backoff.

Dead-letter queue (WA.11)

After max_retries, the delivery is moved to the dead-letter queue — a separate filterable view at /ws/{slug}/settings/webhooks/dlq showing every permanently failed delivery with its payload, last error, and full response history. Admins can:

  • Inspect the payload and response history.
  • Replay a single delivery to retry with the current (possibly rotated) secret and URL.
  • Replay bulk by date range — useful after a downstream outage (T2.11).
  • Purge old DLQ rows (defaulted to 90-day retention).

The DLQ surface is also wired to a dispatch_event of kind webhook.dead_lettered so it can itself trigger a playbook (e.g. open an incident if the DLQ grows past N rows).

Signing key rotation (WA.12)

Rotating a webhook's signing key is dual-active: a rotation generates a new secret, both secrets sign every outgoing payload (two X-Vigilo-Signature headers — primary and secondary), and the old secret is purged after a configurable overlap window (default 24 hours).

This lets the receiver upgrade their verifier without a window of broken signatures. The overlap window is enforced server-side; you cannot extend past 7 days. Rotation is audited and dispatches webhook.rotated.

HMAC-SHA256 for outbound

Every outbound payload carries:

  • X-Vigilo-Event — the event key.
  • X-Vigilo-Delivery — UUID of the WebhookDelivery row (idempotency key).
  • X-Vigilo-Signaturesha256=<hex(hmac_sha256(secret, body))>. During rotation, also X-Vigilo-Signature-Next (or -Prev).
  • X-Vigilo-Timestamp — UNIX seconds, used to prevent replay attacks (receivers should reject deliveries older than 5 minutes).

The body is the JSON payload of the event — never form-encoded, never URL-encoded, always raw JSON.

Inbound webhooks (verification utility)

Vigilo accepts inbound webhooks at /ws/{slug}/integrations/{kind}/webhook/ for known integrations and at /ws/{slug}/automation/inbound/{token}/ for generic inbound. Inbound flows always verify the sender's signature using a per-row signing secret stored encrypted.

The signature verification utility lives at apps/integrations/utils/verify_signature.py (T0.6) and implements:

  • HMAC-SHA256 against the raw body.
  • Constant-time compare.
  • Timestamp window check (default ±300s) to prevent replay.

Inbound webhook authors should pick a unique token per integration and rotate it on the same cadence as outbound webhooks.

Per-row actions (PR3)

Each webhook row in the list view exposes a per-row action menu with Test (sends a synthetic event), Rotate secret, Edit, Pause/Resume, View deliveries, and Delete. The same actions are repeated in the detail page header.

Common workflows

Create an outbound webhook

  1. Settings → Webhooks → New webhook, name it, paste the URL.
  2. Pick events from the multi-select (e.g. change.scheduled, incident.opened).
  3. Copy the generated secret to the receiver's signature verifier.
  4. Click Save, then Test to send a synthetic event.

Verify a delivery on the receiver side

import hmac, hashlib, json

def verify(secret: bytes, body: bytes, header_sig: str) -> bool:
    expected = "sha256=" + hmac.new(secret, body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected, header_sig)

Reject any request where now() - X-Vigilo-Timestamp > 300.

Replay a single failed delivery

  1. Open the webhook detail, Deliveries tab, filter to failed.
  2. Click a row, then Replay. The delivery is requeued; if it succeeds, the original row is updated; if it fails again, a new attempt counter increments.

Bulk replay a date range

  1. Open /ws/{slug}/settings/webhooks/dlq, filter by webhook and date range.
  2. Click Bulk replay, confirm the count. The system requeues all selected rows in a paced batch (default 10 per second).

Rotate a signing key

  1. Open the webhook detail, click Rotate secret. The new secret is shown once; copy it.
  2. Update the receiver to accept both secrets. The receiver should check both X-Vigilo-Signature and X-Vigilo-Signature-Next headers.
  3. After the 24-hour overlap, the old secret is purged.

Permissions

  • Owners and admins can create, rotate, and delete webhooks.
  • Engineers can read and replay deliveries.
  • Approvers / viewers cannot see secrets.

Troubleshooting

All deliveries return 401 from the receiver — Signature verifier mismatch. Confirm the secret was copied without leading/trailing whitespace and that the verifier hashes the raw body, not the parsed JSON.

Deliveries succeed but receiver does not see the event — The receiver is probably acking 200 before fully processing. Add idempotency on X-Vigilo-Delivery to dedup retries.

DLQ row keeps re-firing after replay — The receiver is consistently returning 5xx. Investigate the receiver; bulk replay is for transient outages, not persistent breakage.

Rotation broke signatures — During the overlap, two signatures are sent; the receiver must accept either. If the receiver only checks X-Vigilo-Signature, the rotated payloads will fail until the old secret is fully migrated. Roll back via Cancel rotation within the overlap window.

Inbound webhook 401 from Vigilo — The signature verifier rejected the request. Confirm the integration's signing secret in Vigilo matches the one the sender uses. Use the Re-verify button on the inbound webhook log row to retry verification with the current secret.

Related