AI assistant: chat, search, and actions

Overview

The AI assistant is Vigilo's workspace-aware copilot. It answers questions about your own data ("which changes are scheduled this weekend?"), surfaces relevant knowledge ("what's the cert-rotation runbook?"), and — with explicit confirmation — performs safe actions on your behalf ("open an incident for the API gateway", "create a freeze window for next Tuesday").

It lives in two places:

AIChatView at /ws/{slug}/ai/chat — a conversational interface with message history per user.
AISearchView at /ws/{slug}/ai/search — a single-shot question box that returns ranked, cited snippets without conversational state.

Both surfaces share the same underlying RAG pipeline and the same workspace-RLS gate, so a question never leaks data from another workspace.

Why it exists

Operators waste hours searching across changes, incidents, docs, runbooks, and KB articles for context that already exists somewhere in the workspace. A workspace-scoped assistant collapses that search into natural language while keeping every retrieval inside the RLS perimeter. The action layer goes a step further: instead of switching tabs to perform a 30-second mechanical task ("create a freeze window"), the user can express the intent in chat and confirm the resulting structured action.

Key concepts

RAG over EmbeddedContent

The assistant retrieves context from the EmbeddedContent table — a workspace-scoped index of (entity_type, entity_id, chunk_index, embedding, content) rows produced by the embedding worker. Indexable entities include kb_article, doc_page, runbook, change_request, incident, decision_record, and meeting_note.

At query time, the user prompt is embedded, a pgvector cosine search returns the top-k matching chunks scoped to the current workspace, and the chunks are stuffed into the LLM prompt as a <context> block with provenance citations. Citations render in the response as numbered chips that link to the source entity.

Intent classification

Before retrieval, the prompt is classified by intent (WB.34) into one of four buckets:

question — informational, answer with cited context only.
list_query — asks for a filtered list of entities ("changes in review this week"). Routed to a structured-query path that hits the API instead of RAG.
create — asks the system to create an entity ("open an incident…"). Routed to the action proposer.
action — asks the system to perform a verb on an existing entity ("close INC-42", "revoke Maria"). Routed to the action proposer.

The classifier is a small zero-shot LLM call when an OpenAI key is configured (see "Configuring the OpenAI key" below); without a key, it falls back to a rule-based regex matcher.

Configuring the OpenAI key

Vigilo resolves the OpenAI API key in two stages:

Per-workspace key — Admin / Owner roles can set a workspace-specific key under Settings → General → AI assistant. The settings page exposes a masked input with a show/hide toggle, status badge (workspace key configured / platform default in use / not configured), and a "Remove workspace key" action. The key itself is CMEK-encrypted at rest; the GET endpoint only ever returns the is_configured flag and the source, never the value.
Platform default — falls back to the OPENAI_API_KEY environment variable on the Django process when no workspace key is set. Single-tenant deployments typically configure only the env var; multi-tenant deployments where each workspace brings its own OpenAI account configure per workspace.

When neither is set, the assistant returns retrieved sources with a message pointing the operator at the settings page — it never silently does nothing.

Action proposals (confirm-before-execute)

When the classifier picks create or action, the assistant produces an action proposal (WB.35): a structured JSON object describing the proposed call — action name, target entity, parameters, and a human-readable summary. The proposal is rendered in chat as a card with Confirm and Cancel buttons. Nothing executes until the user clicks Confirm. Once confirmed, the assistant invokes the action via the standard API path so all permission, audit, RLS, and webhook plumbing fires as if the user had used the UI directly.

Action catalog

The current allow-list of confirmable actions:

create_freeze_window(start, end, scope, reason) — opens a FreezeWindow draft.
create_change(title, change_type, planned_start, planned_end) — opens a CHG draft.
open_incident(title, severity, service) — opens an INC.
revoke_member(user_id, reason) — removes a WorkspaceMembership (admin/owner only).
close_incident(incident_id, resolution_notes) — closes an open incident.

The catalog is intentionally narrow: every action has a clean reversible counterpart, none of them touch billing or auth provider state, and each requires the underlying RBAC permission. Adding actions requires a backend release; user-defined actions are deferred (use Playbooks for that).

OpenAI integration

The assistant requires OPENAI_API_KEY in the platform .env for full-fat behaviour: embeddings via text-embedding-3-small, chat via gpt-4o-mini (configurable via AI_CHAT_MODEL), and the intent classifier zero-shot.

Per-workspace key override is on the roadmap; today the key is platform-wide, and usage is metered into the AIUsageLog table per workspace for chargeback and quota enforcement.

Token-Jaccard fallback

When OPENAI_API_KEY is empty or rate-limited, the assistant falls back to a token-Jaccard retriever: tokenises the prompt, tokenises each candidate chunk, computes Jaccard similarity, returns the top-k. Quality is dramatically lower (no synonymy, no semantic distance), but the assistant remains usable for keyword-heavy questions. The intent classifier also falls back to regex rules. The chat tab shows a yellow banner — "Running in fallback mode" — so the user knows what they're getting.

Common workflows

Ask a question

Open /ws/{slug}/ai/chat. Type a natural-language question — "what changed in production yesterday?".
The classifier picks list_query, the structured-query path runs GET /api/v1/changes/?completed_at__date={yesterday}, and the assistant summarises the result with citations to each CHG.
Click a citation chip to jump to the entity.

Confirm an action

Type "open an incident for the auth service, sev-2, gateway returning 502s".
The intent classifier picks create. The assistant renders an action card with the parsed parameters (title, severity=sev2, service=auth-service).
Edit any field inline (the parsed parameters are user-editable before confirmation), then click Confirm.
The action calls POST /api/v1/incidents/, the incident is created, the response card flips to "Created INC-104" with a deep link.

Search

Use AISearchView when you want a non-conversational answer — for example, embedding the search in a Confluence-style sidebar.
The same RAG path runs, but no conversational history is appended. Each search call is stateless.

Permissions

Viewers can ask questions and use search. The assistant filters retrieval to entities the viewer is allowed to read.
Engineers and above can confirm create_change, open_incident, create_freeze_window, and close_incident actions.
Admins / owners can confirm revoke_member.
Action confirmation always re-checks the underlying RBAC permission at execution time, regardless of the surface the user came from.

Troubleshooting

Yellow "fallback mode" banner — OPENAI_API_KEY is missing or invalid in the platform .env. Set the key and restart the Celery worker; the assistant returns to full quality on the next message.

Assistant cites a document I cannot open — A citation can be returned for a document you have read access to but a writer has since revoked. The citation chip falls back to a "no longer accessible" tooltip. Refresh the chat to drop the stale citation.

Action card stuck on "Confirming…" — The underlying action API returned a 4xx. Open the chat panel's developer drawer to see the response payload; common causes are missing required parameters (the LLM hallucinated a value the API rejected) or a permission error.

Assistant returns "I don't have enough context" — The RAG retriever found no chunks above the relevance threshold. Either the data is unindexed (kick a reindex via admin), or the question is too vague — rephrase with concrete entity names.