Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and
stat() on CoW files at 500ms intervals per running sandbox. Three tiered
ring buffers downsample into 30s and 5min averages for 10min/2h/24h
retention. Metrics are flushed to DB on pause (all tiers) and destroy
(24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the
host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on
the control plane. Returns live data for running sandboxes, DB data for
paused, and 404 for stopped.
- Replace stale snapshot read (GetCurrentMetrics) with live query
(GetLiveMetrics) against sandboxes table — always returns correct
zeros when no capsules are running
- Fix CPU reserved formula: running + starting only; paused VMs no
longer contribute vCPUs (RAM reservation for paused unchanged)
- Merge top cards into 3 paired Now/Peak cards with colored accent
borders (green/blue/amber matching chart colors)
- Move Live badge from Running Capsules card to page-level header
- Add colored category dots to card and chart headers
- Charts stacked vertically, flex-1 to fill remaining page height
- vCPUs chart color changed to blue (#5a9fd4), RAM stays amber
- New sandbox_metrics_snapshots table sampled every 10s (60-day retention)
- Background MetricsSampler goroutine wired into control plane startup
- GET /v1/sandboxes/stats?range=5m|1h|6h|24h|30d endpoint with adaptive
polling intervals; reserved CPU/RAM uses ceil(paused/2) formula
- StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight
lines, integer y-axis for running count, dual-axis for CPU/RAM)
- Range filter persisted in URL query param; polls update data silently
(no blink — loading state only shown on initial mount)
- Split /dashboard/capsules into /list and /stats sub-routes with shared
layout; capsuleRunningCount store syncs badge across routes
- CreateCapsuleDialog extracted as reusable component
Introduces an append-only audit trail for all user and system actions:
sandbox lifecycle (create/pause/resume/destroy/auto-pause), snapshots,
team rename, API key create/revoke, member add/remove/leave/role_update,
and BYOC host add/delete/marked_down/marked_up.
- New audit_logs table (migration) with team_id, actor, resource,
action, scope (team|admin), status (success|info|warning|error),
metadata, and created_at
- AuditLogger (internal/audit) with named fire-and-forget methods per
event; system actor used for background events (HostMonitor, TTL reaper)
- GET /v1/audit-logs: JWT-only, cursor pagination (max 200), multi-value
filters for resource_type and action (comma-sep or repeated params);
members see team-scoped events only, admins/owners see all
- AuthContext extended with APIKeyID + APIKeyName so API key requests
record meaningful actor identity
- HostMonitor wired with AuditLogger for auto-pause and host marked_down
- Frontend: BYOC hosts page (/dashboard/byoc) with register/delete flows,
shimmer loading, pulsing online status, animated token reveal checkmark
- Frontend: Admin section (/admin/hosts) with platform + BYOC tabs, stat
pills, skeleton loading, slide-in animations for new rows
- Frontend: AdminSidebar component with accent top bar and admin pill badge
- Frontend: BYOC nav item shown only when team.is_byoc is true (derived
from teams store, not JWT); disabled for members
- Frontend: Admin shield button in Sidebar, visible only to platform admins
- Backend: is_admin in JWT claims + requireAdmin middleware (DB-validated)
- Backend: is_byoc added to teamResponse so frontend derives visibility
from fresh team data rather than stale JWT fields
- Backend: SetBYOC admin endpoint (PUT /v1/admin/teams/{id}/byoc)
- Backend: Admin hosts list enriches BYOC entries with team_name
- Host agent: load .env file via godotenv on startup
Replaces the hardcoded CP_HOST_AGENT_ADDR single-agent setup with a
DB-driven registration system supporting multiple host agents (BYOC).
Key changes:
- Host agents register via one-time token, receive a 7-day JWT + 60-day
refresh token; heartbeat loop auto-refreshes on 401/403 and pauses all
sandboxes if refresh fails
- HostClientPool: lazy Connect RPC client cache keyed by host ID, replacing
the single static agent client throughout the API and service layers
- RoundRobinScheduler: picks an online host for each new sandbox via
ListActiveHosts; extensible for future scheduling strategies
- HostMonitor (replaces Reconciler): passive heartbeat staleness check marks
hosts unreachable and sandboxes missing after 90s; active reconciliation
per online host restores missing-but-alive sandboxes and stops orphans
- Graceful host delete: returns 409 with affected sandbox list without
?force=true; force-delete destroys sandboxes then evicts pool client
- Snapshot delete broadcasts to all online hosts (templates have no host_id)
- sandbox.Manager.PauseAll: pauses all running VMs on CP connectivity loss
- New migration: host_refresh_tokens table with token rotation (issue-then-
revoke ordering to prevent lockout on mid-rotation crash)
- New sandbox status 'missing' (reversible, unlike 'stopped') and host
status 'unreachable'; both reflected in OpenAPI spec
- Fix: refresh token auth failure now returns 401 (was 400 via generic
'invalid' substring match in serviceErrToHTTP)
- Add name column to users (migration + sqlc regen); propagate through JWT
claims, auth context, all auth/OAuth handlers, service layer, and frontend
- Sidebar and team page show name instead of email; team page splits Name/Email
into separate columns
- Block sandbox creation in UI and API when user has no active team context
- loginTeam helper falls back to first active team when no default is set,
fixing login for invited users with no is_default membership
- Exclude soft-deleted teams from GetDefaultTeamForUser, GetBYOCTeams queries
- Guard host creation against soft-deleted teams in service/host.go
- SwitchTeam re-fetches name from DB instead of trusting stale JWT claim
- Reset teams store on login so stale data from a previous session never persists
- Update openapi.yaml: add name to SignupRequest and AuthResponse schemas
- Allow hyphens, @, and apostrophes in team names (backend regex)
- After delete/leave, switch to next available team instead of logging
out; if no teams remain, show a toast prompting to create one
- Disable delete/leave button when user has only one team, with
explanatory hint to create another team first
- Show empty state on /dashboard/team when auth has no team context,
pointing user to the sidebar to create a team
- Fetch all teams in parallel with team detail on page load to power
the isLastTeam guard
- Three-role model (owner/admin/member) with owner protection invariants
- Team CRUD: create, rename (admin+), soft-delete with VM cleanup (owner only)
- Member management: add by email, remove, role updates (admin+), leave
- Switch-team endpoint re-issues JWT after DB membership verification
- User email prefix search for add-member UI autocomplete
- JWT carries role as a hint; all authorization decisions verified from DB
- Team slug: immutable 12-char hex (e.g. a1b2c3-d1e2f3), reserved on soft-delete
- Migration adds slug + deleted_at to teams; backfills existing rows
- Snapshot delete: make agent RPC failure a hard error so DB record is
not removed when files cannot be deleted from disk
- Snapshot overwrite: call agent to delete old files before removing the
DB record, preventing stale memfile.{uuid} generations from accumulating
on disk across repeated overwrites
- Sandbox destroy: only swallow CodeNotFound from the agent (sandbox
already gone / TTL-reaped); any other error now propagates to the caller
instead of being silently ignored
Moves business logic from API handlers into internal/service/ so that
both the REST API and the upcoming dashboard can share the same operations
without duplicating code. API handlers now delegate to the service layer
and only handle HTTP-specific concerns (request parsing, response formatting).