Asset discovery and reconciliation

Manual CMDB maintenance loses to entropy. Vigilo's discovery loop talks to your cloud providers and service catalogues directly, populates the Asset table, and surfaces drift between Vigilo's view and the source of truth so reconciliation is a daily five-minute job, not a quarterly mass-cleanup.

Overview

A CloudAccount row carries the credentials and configuration for one source of truth: an AWS account, a GCP project, an Azure subscription, a Kubernetes cluster, or a Consul cluster. The vigilo.assets.discover_cloud Celery task wakes on a configurable schedule (default every 30 minutes), iterates every active CloudAccount, asks the appropriate adapter for its current resource list, and upserts Asset rows keyed by external_id.

A second sweep — the reconciler — compares the upsert result against the existing CMDB and writes drift_status:

ok — Vigilo has the row, source still reports it.
missing_in_source — Vigilo has the row but the source no longer returns it. Probable manual deletion upstream.
new_in_source — Vigilo did not have a row before this run. Often an auto-create, but rows in environments that require human gating land here for approval.

Why it exists

Cloud inventories change continuously, and a CMDB that lags by a week is worse than no CMDB at all — engineers stop trusting it and write their own spreadsheets. Putting upsert-by-external-id and drift detection at the heart of the inventory means Vigilo is always within one discovery cycle of the truth, and stale rows are surfaced for cleanup rather than silently accumulating.

Key concepts

CloudAccount fields — name, provider (aws, gcp, azure, k8s, consul), credentials (encrypted JSON), regions (list, for multi-region providers), discovery_interval_minutes (default 30), is_active, last_discovered_at, last_status.
Adapter status — AWS, Kubernetes, and Consul adapters are fully implemented and tested in production. GCP and Azure adapters are stubs: the wiring is in place, the schema is correct, but the live API calls return canned data until WD.20 ships. The CloudAccount form for stubbed providers shows a "Stub adapter" badge so you do not configure something that will not work yet.
Idempotent upsert by external_id — the discover task uses external_id as the natural key. Re-running discovery will not create duplicate rows; existing rows are updated in place. This is also what makes the reconciler safe to re-run on demand.
Credentials encryption — CloudAccount.credentials is encrypted at rest using workspace CMEK (customer-managed encryption keys; see Architecture → Encryption). A platform administrator with database access cannot read the credentials without a workspace-issued data key.
Asset Reconciliation page (WB.25) — dedicated UI for triaging drift. Three tabs: New, Missing, Conflicts. Bulk-accept and bulk-reject actions speed up clean-up.
Discovery dry-run — every CloudAccount has a Test discovery action that runs the adapter without writing to the CMDB and returns a sample of resources plus any auth errors.

Common workflows

1. Connect an AWS account

Open Inventory → Cloud accounts → + Add account.
Provider: aws. Name: acme-prod.
Credentials: paste an IAM role ARN (preferred — Vigilo's instance role will assume it) or an access key pair (discouraged but supported).
Regions: pick from a multi-select; defaults to every region the credentials can reach.
Interval: leave at 30 minutes.
Click Test discovery. The result panel shows resources Vigilo would import (EC2, ELB, RDS, ACM certificates, Route53 records). Fix any auth errors before saving.
Click Save. Discovery runs immediately, then on schedule.

2. Connect a Kubernetes cluster

Add a CloudAccount with Provider: k8s.
Paste a kubeconfig snippet limited to a service account with read access on the relevant namespaces.
Discovery imports Deployments, StatefulSets, Services, and Ingresses as Assets; pod-level granularity is intentionally not imported (too noisy).
Ingress objects automatically gain a hosts relationship to their Service targets — these populate the dependency map.

3. Connect a Consul cluster

Add a CloudAccount with Provider: consul.
Provide the Consul HTTP API URL and an ACL token with service:read and node:read.
Discovery imports services and nodes. Services with multiple instances are deduplicated by name + datacentre.

4. Triage drift on the reconciliation page

Open Inventory → Reconciliation.
New in source tab — accept (creates the Asset with discovered defaults), reject (writes an exclusion so the next discovery skips it), or edit (open the asset before accepting so you can set owner and criticality).
Missing in source tab — delete (removes the Asset; the source no longer has it anyway), keep (clears the drift flag without re-importing), or investigate (open the asset detail).
Conflicts tab — surfaces rows where the discovery output disagrees with manual edits (e.g. you set environment = staging but the AWS tag says prod). Choose Trust source or Trust local.

5. Stub a provider for staging while waiting on real support

Add a GCP CloudAccount. The form warns you that it is a stub adapter.
Save anyway. Discovery returns the canned dataset (a handful of representative resource types) so you can preview how the Asset table will look once the real adapter ships.
Mark the CloudAccount inactive when you are done previewing, so the canned data does not pollute production reports.

6. Investigate a discovery failure

The CloudAccount tile on the Inventory → Cloud accounts page turns red when the last run failed.
Click the tile and open the Runs tab. Each run shows status, duration, resource counts and the raw adapter log.
Common failures: expired IAM credentials, throttling, network unreachable.
Fix and re-run via Run discovery now.

Permissions

Action	Roles
View CloudAccounts and runs	Admin, Owner
Create / edit CloudAccount	Admin, Owner
Delete CloudAccount	Owner
Trigger ad-hoc discovery	Operator, Admin, Owner
Accept new-in-source rows	Operator, Admin, Owner
Delete missing-in-source rows	Admin, Owner
Resolve conflicts	Operator, Admin, Owner

Cloud credentials never leave the workspace. The CMEK wrap key is held in the platform's key management service; the data key required to decrypt credentials is issued only to a workspace-scoped worker process at task runtime.

Troubleshooting

Discovery returns zero resources but credentials test green. Either the IAM role / token is missing read permissions on the specific resource types, or the regions list is too narrow. Widen permissions and regions, then re-run.

Same resource appears as new_in_source every run. The reconciler hashes resources by external_id. If the source assigns a new ID per run (e.g. ephemeral container IDs), drift will re-fire. Either change the adapter mapping to use a stable identifier (cluster + namespace + name for K8s) or mark the resource type as excluded.

Credentials look encrypted but a security scan flagged them. The encrypted blob is stored as base64 ciphertext, which scanners sometimes flag as "high-entropy string". Configure the scanner to ignore CloudAccount.credentials fields.

GCP discovery imported nothing useful. GCP is a stub adapter today. Once WD.20 ships, the same CloudAccount row will start importing real data without reconfiguration.

Consul discovery is missing services I see in the UI. The ACL token used must have service:read on every datacentre. A token scoped to one DC will silently exclude the rest.

Asset CMDB — the table that discovery writes to.
Dependency map — many edges come from K8s and Consul discovery output.
Vendor inventory and risk — discovered cloud accounts can be linked to vendor records.