- Fix envRegex: remove spurious (\$)? group that swallowed $$$, handle ${}
- wrenn-init.sh: add || true to networking commands under set -e, remove dead code
- waitForHealthcheck: use context deadline for unlimited retries instead of implicit 100 cap
- Make parseSandboxEnv a package-level function (unused receiver)
- Fix WrappedCommand test: map iteration order dependency, pre-expand env values
- Fix error wrapping: %v → %w per project conventions
- test-jupyter-kernel.py: move import to top-level, fix misleading comment
healthchecks
Fix ENV instructions to expand $VAR references at set time using the
current env state, preventing self-referencing values like
PATH=/opt/venv/bin:$PATH from producing recursive expansions. Remove
expandEnv from shellPrefix to avoid double expansion.
Fetch sandbox environment variables via `env` before recipe execution
so ENV steps resolve against actual runtime values from the base
template image.
Replace hardcoded healthcheck timing with a Dockerfile-like flag parser
supporting --interval, --timeout, --start-period, and --retries. Add
start-period grace window and bounded retry counting to
waitForHealthcheck.
Add python-interpreter-v0-beta recipe and healthcheck files.
- skip_pre_post flag on builds bypasses apt update/clean pre/post steps for
faster iteration when the recipe handles its own environment setup
- POST /v1/admin/builds/{id}/cancel endpoint marks an in-progress build as
cancelled; UpdateBuildStatus now also sets completed_at for 'cancelled'
- internal/recipe: typed recipe parser and executor (RUN/ENV/COPY steps)
replacing the raw string slice approach in the build worker
- pre/post build commands prefixed with RUN to match recipe step format
- Internal ECDSA P-256 CA (WRENN_CA_CERT/WRENN_CA_KEY env vars); when absent
the system falls back to plain HTTP so dev mode works without certificates
- Host leaf cert (7-day TTL, IP SAN) issued at registration and renewed on
every JWT refresh; fingerprint + expiry stored in DB (cert_expires_at column
replaces the removed mtls_enabled flag)
- CP ephemeral client cert (24-hour TTL) via CPCertStore with atomic hot-swap;
background goroutine renews it every 12 hours without restarting the server
- Host agent uses tls.Listen + httpServer.Serve so GetCertificate callback is
respected (ListenAndServeTLS always reads cert from disk)
- Sandbox reverse proxy now uses pool.Transport() so it shares the same TLS
config as the Connect RPC clients instead of http.DefaultTransport
- Credentials file renamed host-credentials.json with cert_pem/key_pem/
ca_cert_pem fields; duplicate register/refresh response structs collapsed
to authResponse
Introduces internal/layout package for centralized path construction,
migrates templates from name-based TEXT primary keys to UUID PKs with
team-scoped directories (WRENN_DIR/images/teams/{team_id}/{template_id}).
The built-in minimal template uses sentinel zero UUIDs. Proto messages
carry team_id + template_id alongside deprecated template name field.
Team deletion now cleans up template files across all hosts.
Snapshot race fix:
- Pre-mark sandbox as "paused" in DB before issuing CreateSnapshot and
PauseSandbox RPCs, preventing the reconciler from marking it "stopped"
during the flatten window when the sandbox is gone from the host
agent's in-memory map but DB still says "running"
- Revert status to "running" on RPC failure
- Check ctx.Err() before writing response to avoid writing to dead
connections when client disconnects during long snapshot operations
Delete auth fix:
- Block non-admin deletion of platform templates (team_id = all-zeros)
at DELETE /v1/snapshots/{name} with 403, preventing file deletion
before the team ownership check fails
Sparse dd:
- Add conv=sparse to dd in FlattenSnapshot so flattened images preserve
sparseness (~200MB actual vs 5GB logical)
Default disk size:
- Change default disk_size_mb from 20GB to 5GB across migration,
manager, service, build, and EnsureImageSizes
- Disable split-button dropdown arrow for platform templates in
dashboard snapshots page (teams cannot delete platform templates)
Disk sizing:
- Add disk_size_mb column to sandboxes table (default 20480 = 20GB)
- Add disk_size_mb to CreateSandboxRequest proto, passed through the
full chain: service → RPC → host agent → sandbox manager → devicemapper
- devicemapper.CreateSnapshot takes separate cowSizeBytes param so the
sparse CoW file can be sized independently from the origin
- EnsureImageSizes() runs at host agent startup: expands any base image
smaller than 20GB via truncate + resize2fs (sparse, no extra physical
disk). Sandboxes then get the full 20GB via fast dm-snapshot path
- FlattenRootfs shrinks output images with resize2fs -M so stored
templates are compact; EnsureImageSizes re-expands on next startup
Admin templates visibility:
- Add GET /v1/admin/templates endpoint listing all templates across teams
- Frontend admin templates page uses listAdminTemplates() instead of
team-scoped listSnapshots()
- Platform templates (team_id = all-zeros UUID) now visible to all teams:
GetTemplateByTeam, ListTemplatesByTeam, ListTemplatesByTeamAndType
queries include platform team_id in WHERE clause
Consolidate 16 migrations into one with UUID columns for all entity
IDs. TEXT is kept only for polymorphic fields (audit_logs.actor_id,
resource_id) and template names. The id package now generates UUIDs
via google/uuid, with Format*/Parse* helpers for the prefixed wire
format (sb-{uuid}, usr-{uuid}, etc.). Auth context, services, and
handlers pass pgtype.UUID internally; conversion to/from prefixed
strings happens at API and RPC boundaries. Adds PlatformTeamID
(all-zeros UUID) for shared resources.
- Use context.Background() with timeout in destroySandbox/failBuild so
cleanup and DB writes survive parent context cancellation on shutdown
- Fix loop device refcount leak in FlattenRootfs when dmDevice is nil
- Replace time.After with time.NewTimer in healthcheck polling to avoid
goroutine leak when healthcheck passes early
- Capture size_bytes from CreateSnapshot/FlattenRootfs RPC responses
instead of hardcoding 0 in the templates table insert
- Avoid leaking internal error details to API clients in build handler
Introduces an end-to-end template building pipeline: admins submit a recipe
(list of shell commands) via the dashboard, a Redis-backed worker pool spins
up a sandbox, executes each command, and produces either a full snapshot
(with healthcheck) or an image-only template (rootfs flattened via a new
FlattenRootfs host-agent RPC). Build progress and per-step logs are persisted
to a new template_builds table and polled by the frontend.
Backend:
- New FlattenRootfs RPC (proto + host agent + sandbox manager)
- BuildService with Redis queue (BLPOP) and configurable worker pool (default 2)
- Admin-only REST endpoints: POST/GET /v1/admin/builds, GET /v1/admin/builds/{id}
- Migration for template_builds table with JSONB logs and recipe columns
- sqlc queries for build CRUD and progress updates
Frontend:
- /admin/templates page with Templates + Builds tabs
- Create Template dialog with recipe textarea, healthcheck, specs
- Build history with expandable per-step logs, status badges, progress bars
- Auto-polling every 3s for active builds
- AdminSidebar updated with Templates nav item
Escape LIKE metacharacters (% and _) in the email prefix before passing
to the SQL query, and enforce the documented '@' requirement to prevent
broad user enumeration. Move search logic out of TeamService into
usersHandler since it is a site-wide lookup, not team-scoped.
Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and
stat() on CoW files at 500ms intervals per running sandbox. Three tiered
ring buffers downsample into 30s and 5min averages for 10min/2h/24h
retention. Metrics are flushed to DB on pause (all tiers) and destroy
(24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the
host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on
the control plane. Returns live data for running sandboxes, DB data for
paused, and 404 for stopped.
- Replace stale snapshot read (GetCurrentMetrics) with live query
(GetLiveMetrics) against sandboxes table — always returns correct
zeros when no capsules are running
- Fix CPU reserved formula: running + starting only; paused VMs no
longer contribute vCPUs (RAM reservation for paused unchanged)
- Merge top cards into 3 paired Now/Peak cards with colored accent
borders (green/blue/amber matching chart colors)
- Move Live badge from Running Capsules card to page-level header
- Add colored category dots to card and chart headers
- Charts stacked vertically, flex-1 to fill remaining page height
- vCPUs chart color changed to blue (#5a9fd4), RAM stays amber
- New sandbox_metrics_snapshots table sampled every 10s (60-day retention)
- Background MetricsSampler goroutine wired into control plane startup
- GET /v1/sandboxes/stats?range=5m|1h|6h|24h|30d endpoint with adaptive
polling intervals; reserved CPU/RAM uses ceil(paused/2) formula
- StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight
lines, integer y-axis for running count, dual-axis for CPU/RAM)
- Range filter persisted in URL query param; polls update data silently
(no blink — loading state only shown on initial mount)
- Split /dashboard/capsules into /list and /stats sub-routes with shared
layout; capsuleRunningCount store syncs badge across routes
- CreateCapsuleDialog extracted as reusable component
Introduces an append-only audit trail for all user and system actions:
sandbox lifecycle (create/pause/resume/destroy/auto-pause), snapshots,
team rename, API key create/revoke, member add/remove/leave/role_update,
and BYOC host add/delete/marked_down/marked_up.
- New audit_logs table (migration) with team_id, actor, resource,
action, scope (team|admin), status (success|info|warning|error),
metadata, and created_at
- AuditLogger (internal/audit) with named fire-and-forget methods per
event; system actor used for background events (HostMonitor, TTL reaper)
- GET /v1/audit-logs: JWT-only, cursor pagination (max 200), multi-value
filters for resource_type and action (comma-sep or repeated params);
members see team-scoped events only, admins/owners see all
- AuthContext extended with APIKeyID + APIKeyName so API key requests
record meaningful actor identity
- HostMonitor wired with AuditLogger for auto-pause and host marked_down
- Frontend: BYOC hosts page (/dashboard/byoc) with register/delete flows,
shimmer loading, pulsing online status, animated token reveal checkmark
- Frontend: Admin section (/admin/hosts) with platform + BYOC tabs, stat
pills, skeleton loading, slide-in animations for new rows
- Frontend: AdminSidebar component with accent top bar and admin pill badge
- Frontend: BYOC nav item shown only when team.is_byoc is true (derived
from teams store, not JWT); disabled for members
- Frontend: Admin shield button in Sidebar, visible only to platform admins
- Backend: is_admin in JWT claims + requireAdmin middleware (DB-validated)
- Backend: is_byoc added to teamResponse so frontend derives visibility
from fresh team data rather than stale JWT fields
- Backend: SetBYOC admin endpoint (PUT /v1/admin/teams/{id}/byoc)
- Backend: Admin hosts list enriches BYOC entries with team_name
- Host agent: load .env file via godotenv on startup
Replaces the hardcoded CP_HOST_AGENT_ADDR single-agent setup with a
DB-driven registration system supporting multiple host agents (BYOC).
Key changes:
- Host agents register via one-time token, receive a 7-day JWT + 60-day
refresh token; heartbeat loop auto-refreshes on 401/403 and pauses all
sandboxes if refresh fails
- HostClientPool: lazy Connect RPC client cache keyed by host ID, replacing
the single static agent client throughout the API and service layers
- RoundRobinScheduler: picks an online host for each new sandbox via
ListActiveHosts; extensible for future scheduling strategies
- HostMonitor (replaces Reconciler): passive heartbeat staleness check marks
hosts unreachable and sandboxes missing after 90s; active reconciliation
per online host restores missing-but-alive sandboxes and stops orphans
- Graceful host delete: returns 409 with affected sandbox list without
?force=true; force-delete destroys sandboxes then evicts pool client
- Snapshot delete broadcasts to all online hosts (templates have no host_id)
- sandbox.Manager.PauseAll: pauses all running VMs on CP connectivity loss
- New migration: host_refresh_tokens table with token rotation (issue-then-
revoke ordering to prevent lockout on mid-rotation crash)
- New sandbox status 'missing' (reversible, unlike 'stopped') and host
status 'unreachable'; both reflected in OpenAPI spec
- Fix: refresh token auth failure now returns 401 (was 400 via generic
'invalid' substring match in serviceErrToHTTP)
- Add name column to users (migration + sqlc regen); propagate through JWT
claims, auth context, all auth/OAuth handlers, service layer, and frontend
- Sidebar and team page show name instead of email; team page splits Name/Email
into separate columns
- Block sandbox creation in UI and API when user has no active team context
- loginTeam helper falls back to first active team when no default is set,
fixing login for invited users with no is_default membership
- Exclude soft-deleted teams from GetDefaultTeamForUser, GetBYOCTeams queries
- Guard host creation against soft-deleted teams in service/host.go
- SwitchTeam re-fetches name from DB instead of trusting stale JWT claim
- Reset teams store on login so stale data from a previous session never persists
- Update openapi.yaml: add name to SignupRequest and AuthResponse schemas
- Allow hyphens, @, and apostrophes in team names (backend regex)
- After delete/leave, switch to next available team instead of logging
out; if no teams remain, show a toast prompting to create one
- Disable delete/leave button when user has only one team, with
explanatory hint to create another team first
- Show empty state on /dashboard/team when auth has no team context,
pointing user to the sidebar to create a team
- Fetch all teams in parallel with team detail on page load to power
the isLastTeam guard
- Three-role model (owner/admin/member) with owner protection invariants
- Team CRUD: create, rename (admin+), soft-delete with VM cleanup (owner only)
- Member management: add by email, remove, role updates (admin+), leave
- Switch-team endpoint re-issues JWT after DB membership verification
- User email prefix search for add-member UI autocomplete
- JWT carries role as a hint; all authorization decisions verified from DB
- Team slug: immutable 12-char hex (e.g. a1b2c3-d1e2f3), reserved on soft-delete
- Migration adds slug + deleted_at to teams; backfills existing rows
- Snapshot delete: make agent RPC failure a hard error so DB record is
not removed when files cannot be deleted from disk
- Snapshot overwrite: call agent to delete old files before removing the
DB record, preventing stale memfile.{uuid} generations from accumulating
on disk across repeated overwrites
- Sandbox destroy: only swallow CodeNotFound from the agent (sandbox
already gone / TTL-reaped); any other error now propagates to the caller
instead of being silently ignored
Moves business logic from API handlers into internal/service/ so that
both the REST API and the upcoming dashboard can share the same operations
without duplicating code. API handlers now delegate to the service layer
and only handle HTTP-specific concerns (request parsing, response formatting).