Escape LIKE metacharacters (% and _) in the email prefix before passing
to the SQL query, and enforce the documented '@' requirement to prevent
broad user enumeration. Move search logic out of TeamService into
usersHandler since it is a site-wide lookup, not team-scoped.
Maps each user-facing range to the appropriate underlying ring buffer
tier and applies a time cutoff filter. No new ring buffers needed —
5m/10m read from the 10m tier, 1h/2h from the 2h tier, 6h/12h/24h
from the 24h tier.
Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and
stat() on CoW files at 500ms intervals per running sandbox. Three tiered
ring buffers downsample into 30s and 5min averages for 10min/2h/24h
retention. Metrics are flushed to DB on pause (all tiers) and destroy
(24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the
host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on
the control plane. Returns live data for running sandboxes, DB data for
paused, and 404 for stopped.
Poll fetches now silently update data without triggering loading
states, spinner animations, or row fadeUp re-animations. Only manual
refresh shows the spin indicator.
CPU (vCPUs) and RAM (GB) use different units and scales, so combining
them on a dual-axis chart was misleading. Each now has its own chart
card, laid out side-by-side.
SampleSandboxMetrics previously filtered WHERE status IN ('running',
'starting', 'paused'), which returned no rows when all capsules were
stopped. This caused zero snapshots to be skipped, leaving the
time-series charts with no trailing data points instead of showing
the expected zero values.
Remove the WHERE filter so the query groups by all teams that have
any sandbox row. The per-status FILTER clauses on the aggregates
already produce correct zero counts for stopped capsules.
Also includes the per-VM RAM ceiling formula change (sum(ceil(each/2))
instead of ceil(sum/2)).
- Add Metrics nav item to sidebar with bar chart icon
- Create /dashboard/metrics page wrapping StatsPanel
- Remove tabs from capsules page (list is now the only view)
- Flatten capsules route: /capsules directly shows the list,
removing the /list and /stats sub-routes
- Strip redundant title/subtitle from StatsPanel (page header
provides context)
- Replace stale snapshot read (GetCurrentMetrics) with live query
(GetLiveMetrics) against sandboxes table — always returns correct
zeros when no capsules are running
- Fix CPU reserved formula: running + starting only; paused VMs no
longer contribute vCPUs (RAM reservation for paused unchanged)
- Merge top cards into 3 paired Now/Peak cards with colored accent
borders (green/blue/amber matching chart colors)
- Move Live badge from Running Capsules card to page-level header
- Add colored category dots to card and chart headers
- Charts stacked vertically, flex-1 to fill remaining page height
- vCPUs chart color changed to blue (#5a9fd4), RAM stays amber
- New sandbox_metrics_snapshots table sampled every 10s (60-day retention)
- Background MetricsSampler goroutine wired into control plane startup
- GET /v1/sandboxes/stats?range=5m|1h|6h|24h|30d endpoint with adaptive
polling intervals; reserved CPU/RAM uses ceil(paused/2) formula
- StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight
lines, integer y-axis for running count, dual-axis for CPU/RAM)
- Range filter persisted in URL query param; polls update data silently
(no blink — loading state only shown on initial mount)
- Split /dashboard/capsules into /list and /stats sub-routes with shared
layout; capsuleRunningCount store syncs badge across routes
- CreateCapsuleDialog extracted as reusable component
- app.css: replace flat --shadow-sm token with real shadows; add
--shadow-card and --shadow-dialog tokens; add @keyframes status-ping
and .animate-status-ping utility (outward ring ripple, GPU-composited
via will-change) for live running status dots
- login: headline 5rem → 6.5rem with tighter leading/tracking; expand
container to 460px; add sage-green dot grid texture layer beneath the
mouse-reactive glow for industrial depth
- capsules: upgrade all running dots (header chip + row indicators +
status bar) from opacity-fade to ring ripple; apply --shadow-dialog
to Launch and Snapshot dialogs
- keys: apply --shadow-dialog to all three dialogs
- audit: remove duplicate @keyframes fadeUp and iconFloat (redundant
with app.css definitions, audit's fadeUp also subtly diverged)
- sidebar: active indicator bar taller and thicker (h-5 w-[3px] → h-6
w-1); active bg more vivid (accent/12%); label font-medium →
font-semibold; team dialog gets --shadow-dialog
Introduces an append-only audit trail for all user and system actions:
sandbox lifecycle (create/pause/resume/destroy/auto-pause), snapshots,
team rename, API key create/revoke, member add/remove/leave/role_update,
and BYOC host add/delete/marked_down/marked_up.
- New audit_logs table (migration) with team_id, actor, resource,
action, scope (team|admin), status (success|info|warning|error),
metadata, and created_at
- AuditLogger (internal/audit) with named fire-and-forget methods per
event; system actor used for background events (HostMonitor, TTL reaper)
- GET /v1/audit-logs: JWT-only, cursor pagination (max 200), multi-value
filters for resource_type and action (comma-sep or repeated params);
members see team-scoped events only, admins/owners see all
- AuthContext extended with APIKeyID + APIKeyName so API key requests
record meaningful actor identity
- HostMonitor wired with AuditLogger for auto-pause and host marked_down
- Frontend: BYOC hosts page (/dashboard/byoc) with register/delete flows,
shimmer loading, pulsing online status, animated token reveal checkmark
- Frontend: Admin section (/admin/hosts) with platform + BYOC tabs, stat
pills, skeleton loading, slide-in animations for new rows
- Frontend: AdminSidebar component with accent top bar and admin pill badge
- Frontend: BYOC nav item shown only when team.is_byoc is true (derived
from teams store, not JWT); disabled for members
- Frontend: Admin shield button in Sidebar, visible only to platform admins
- Backend: is_admin in JWT claims + requireAdmin middleware (DB-validated)
- Backend: is_byoc added to teamResponse so frontend derives visibility
from fresh team data rather than stale JWT fields
- Backend: SetBYOC admin endpoint (PUT /v1/admin/teams/{id}/byoc)
- Backend: Admin hosts list enriches BYOC entries with team_name
- Host agent: load .env file via godotenv on startup
Replaces the hardcoded CP_HOST_AGENT_ADDR single-agent setup with a
DB-driven registration system supporting multiple host agents (BYOC).
Key changes:
- Host agents register via one-time token, receive a 7-day JWT + 60-day
refresh token; heartbeat loop auto-refreshes on 401/403 and pauses all
sandboxes if refresh fails
- HostClientPool: lazy Connect RPC client cache keyed by host ID, replacing
the single static agent client throughout the API and service layers
- RoundRobinScheduler: picks an online host for each new sandbox via
ListActiveHosts; extensible for future scheduling strategies
- HostMonitor (replaces Reconciler): passive heartbeat staleness check marks
hosts unreachable and sandboxes missing after 90s; active reconciliation
per online host restores missing-but-alive sandboxes and stops orphans
- Graceful host delete: returns 409 with affected sandbox list without
?force=true; force-delete destroys sandboxes then evicts pool client
- Snapshot delete broadcasts to all online hosts (templates have no host_id)
- sandbox.Manager.PauseAll: pauses all running VMs on CP connectivity loss
- New migration: host_refresh_tokens table with token rotation (issue-then-
revoke ordering to prevent lockout on mid-rotation crash)
- New sandbox status 'missing' (reversible, unlike 'stopped') and host
status 'unreachable'; both reflected in OpenAPI spec
- Fix: refresh token auth failure now returns 401 (was 400 via generic
'invalid' substring match in serviceErrToHTTP)
- Add name column to users (migration + sqlc regen); propagate through JWT
claims, auth context, all auth/OAuth handlers, service layer, and frontend
- Sidebar and team page show name instead of email; team page splits Name/Email
into separate columns
- Block sandbox creation in UI and API when user has no active team context
- loginTeam helper falls back to first active team when no default is set,
fixing login for invited users with no is_default membership
- Exclude soft-deleted teams from GetDefaultTeamForUser, GetBYOCTeams queries
- Guard host creation against soft-deleted teams in service/host.go
- SwitchTeam re-fetches name from DB instead of trusting stale JWT claim
- Reset teams store on login so stale data from a previous session never persists
- Update openapi.yaml: add name to SignupRequest and AuthResponse schemas
Delight (keys page):
- Animated checkmark draw + circle pop on key reveal dialog open
- Key display area pulses accent glow on open to draw eye to "copy this"
- Copy button spring-bounces on successful copy (re-triggers on repeat)
- Empty state key icon floats (iconFloat, now global)
- Row hover uses scaleY left-accent stripe (matches capsules pattern)
- New key row flashes accent on reveal dialog dismiss (matches capsule-born)
Audit fixes (all dashboard pages):
- Page titles standardized to em dash: "Wrenn — X" across all four pages
- formatDate/timeAgo extracted to src/lib/utils/format.ts (string | undefined
signatures); keys and snapshots now import from there instead of duplicating
- team formatDate gains undefined guard (kept local, date-only format differs)
- spin-once and iconFloat keyframes moved to app.css as globals; scoped copies
removed from capsules and keys
- Snapshots empty state icon was referencing undefined @keyframes float; fixed
to iconFloat
Normalization:
- Snapshots table rows: replaced ::before pseudo-element accent (opacity-only,
single color) with DOM row-stripe element using scaleY transition, type-keyed
color (green for snapshots, blue for images) — matches capsules pattern
- Create Key dialog: max-w-[400px] → max-w-[420px] to align with form dialogs
- Snapshots count and empty-state heading are now terminology-aware: shows
"templates/snapshots/images" based on active filter; empty heading for all
filter reads "No templates yet" instead of "No snapshots yet"
Not done (documented in audit, deferred):
- Sidebar nav items pointing to unimplemented routes (audit, usage, billing,
notifications, settings) — left as-is, needs product decision
- Dialog max-widths fully normalized beyond Create Key — minor, deferred
- capsules timeAgo not imported from shared util (formatTime differs intentionally)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Slug + Team ID rows collapsed into a 2-column grid for better density
- "you" badge moved inline with email instead of stacked below it
- Copy checkmark draws itself via SVG stroke-dashoffset animation
- New member row flashes accent-green on entry
- Removed member row slides out smoothly (fly transition)
- Member rows use staggered fly-in on page load
- Team name briefly highlights accent color after a successful rename
- Search result avatars get colorized initials based on email character
Teams list was fetched on every Sidebar mount (each page navigation),
causing a flash from '…' to the real name on every tab switch. Move teams
into a module-level reactive store (teams.svelte.ts) that fetches once per
session and is shared between Sidebar and the team page.
The anti-enumeration guard required @ in the email prefix, causing the
typeahead to silently return nothing until the user typed @. Replace with
a minimum 3-character length check to match the frontend trigger condition.
- Allow hyphens, @, and apostrophes in team names (backend regex)
- After delete/leave, switch to next available team instead of logging
out; if no teams remain, show a toast prompting to create one
- Disable delete/leave button when user has only one team, with
explanatory hint to create another team first
- Show empty state on /dashboard/team when auth has no team context,
pointing user to the sidebar to create a team
- Fetch all teams in parallel with team detail on page load to power
the isLastTeam guard
- New /dashboard/team page with inline team name editing, slug/ID copy,
members table with split-button (remove + make admin/member), add member
typeahead, and danger zone (delete/leave) with confirmation dialogs
- Sidebar now fetches real teams from API, supports team switching and
team creation via dialog
- Rename nav item Members → Team, route /dashboard/members → /dashboard/team
- New src/lib/api/team.ts with typed functions for all team endpoints
- Three-role model (owner/admin/member) with owner protection invariants
- Team CRUD: create, rename (admin+), soft-delete with VM cleanup (owner only)
- Member management: add by email, remove, role updates (admin+), leave
- Switch-team endpoint re-issues JWT after DB membership verification
- User email prefix search for add-member UI autocomplete
- JWT carries role as a hint; all authorization decisions verified from DB
- Team slug: immutable 12-char hex (e.g. a1b2c3-d1e2f3), reserved on soft-delete
- Migration adds slug + deleted_at to teams; backfills existing rows
- Increase content padding (p-7→p-8) and table cell padding (px-4→px-5,
py-3→py-4 for data rows) across capsules, keys, and snapshots pages
- Improve animation performance: wrenn-glow uses opacity instead of
box-shadow (compositor-only, no paint cost)
- Add prefers-reduced-motion media query covering inline style animations
- Fix OAuth error display on login page (read ?error= param on mount)
- Harden clipboard copy with try-catch and toast fallback
- Improve empty state copy, dialog microcopy, and error messages
- Add retry button to error banners on keys page
- Replace "All systems operational" footer bar with a clean 1px divider
- Fix text truncation on long capsule/snapshot names (min-w-0 + truncate)
- Snapshot delete: make agent RPC failure a hard error so DB record is
not removed when files cannot be deleted from disk
- Snapshot overwrite: call agent to delete old files before removing the
DB record, preventing stale memfile.{uuid} generations from accumulating
on disk across repeated overwrites
- Sandbox destroy: only swallow CodeNotFound from the agent (sandbox
already gone / TTL-reaped); any other error now propagates to the caller
instead of being silently ignored
- Use tini as PID 1 in wrenn-init.sh so zombie processes are reaped
and signals are forwarded correctly to envd
- Set standard PATH in wrenn-init.sh so child processes spawned by envd
can find common binaries (fixes "nice: ls command not found")
- Add envdclient.Init() to POST /init on envd after every boot/resume,
syncing the guest clock via unix.ClockSettime — critical after snapshot
resume where the guest clock is frozen
- Run Init in a background goroutine so it doesn't block the CreateSandbox
RPC response; a slow Init (vCPU busy with envd startup) was causing the
RPC context to be canceled before the response reached the control plane
- Update rootfs-from-container.sh and update-debug-rootfs.sh to inject
tini into the rootfs, checking the container image and host first,
downloading from GitHub releases as fallback
Replace AGENT_KERNEL_PATH, AGENT_IMAGES_PATH, AGENT_SANDBOXES_PATH,
AGENT_SNAPSHOTS_PATH, and AGENT_TOKEN_FILE with a single
AGENT_FILES_ROOTDIR (default /var/lib/wrenn) that derives all
subdirectory paths automatically.
Introduce three migrations: admin permissions (is_admin + permissions table),
BYOC team tracking, and multi-host support (hosts, host_tokens, host_tags).
Add Redis to dev infra and wire up client in control plane for ephemeral
host registration tokens. Add go-redis dependency.
Moves business logic from API handlers into internal/service/ so that
both the REST API and the upcoming dashboard can share the same operations
without duplicating code. API handlers now delegate to the service layer
and only handle HTTP-specific concerns (request parsing, response formatting).
Implement OAuth 2.0 login via GitHub as an alternative to email/password.
Uses a provider registry pattern (internal/auth/oauth/) so adding Google
or other providers later requires only a new Provider implementation.
Flow: GET /v1/auth/oauth/github redirects to GitHub, callback exchanges
the code for a user profile, upserts the user + team atomically, and
redirects to the frontend with a JWT token.
Key changes:
- Migration: make password_hash nullable, add oauth_providers table
- Provider registry with GitHubProvider (profile + email fallback)
- CSRF state cookie with HMAC-SHA256 validation
- Race-safe registration (23505 collision retries as login)
- Startup validation: CP_PUBLIC_URL required when OAuth is configured
Not fully tested — needs integration tests with a real GitHub OAuth app
and end-to-end testing with the frontend callback page.
Replace the existing auto-destroy TTL behavior with auto-pause: when a
sandbox exceeds its timeout_sec of inactivity, the TTL reaper now pauses
it (snapshot + teardown) instead of destroying it, preserving the ability
to resume later.
Key changes:
- TTL reaper calls Pause instead of Destroy, with fallback to Destroy if
pause fails (e.g. Firecracker process already gone)
- New PingSandbox RPC resets the in-memory LastActiveAt timer
- New POST /v1/sandboxes/{id}/ping REST endpoint resets both agent memory
and DB last_active_at
- ListSandboxes RPC now includes auto_paused_sandbox_ids so the reconciler
can distinguish auto-paused sandboxes from crashed ones in a single call
- Reconciler polls every 5s (was 30s) and marks auto-paused as "paused"
vs orphaned as "stopped"
- Resume RPC accepts timeout_sec from DB so TTL survives pause/resume cycles
- Reaper checks every 2s (was 10s) and uses a detached context to avoid
incomplete pauses on app shutdown
- Default timeout_sec changed from 300 to 0 (no auto-pause unless requested)
- Add retry with backoff to dmsetupRemove for transient "device busy"
errors caused by kernel not releasing the device immediately after
Firecracker exits. Only retries on "Device or resource busy"; other
errors (not found, permission denied) return immediately.
- Thread context.Context through RemoveSnapshot/RestoreSnapshot so
retries respect cancellation. Use context.Background() in all error
cleanup paths to prevent cancelled contexts from skipping cleanup
and leaking dm devices on the host.
- Resume vCPUs on pause failure: if snapshot creation or memfile
processing fails after freezing the VM, unfreeze vCPUs so the
sandbox stays usable instead of becoming a frozen zombie.
- Fix resource leaks in Pause when CoW rename or metadata write fails:
properly clean up network, slot, loop device, and remove from boxes
map instead of leaving a dead sandbox with leaked host resources.
- Fix Resume WaitUntilReady failure: roll back CoW file to the snapshot
directory instead of deleting it, preserving the paused state so the
user can retry.
- Skip m.loops.Release when RemoveSnapshot fails during pause since
the stale dm device still references the origin loop device.
- Fix incorrect VCPUs placeholder in Resume VMConfig that used memory
size instead of a sensible default.
Pause was logging RemoveSnapshot failures as warnings and continuing,
which left stale dm devices behind. Resume then failed trying to create
a device with the same name.
- Make RemoveSnapshot failure a hard error in Pause (clean up remaining
resources and return error instead of silently proceeding)
- Add defensive stale device cleanup in RestoreSnapshot before creating
the new dm device