wrenn-releases

Author	SHA1	Message	Date
pptx704	0ea0e7cc70	Fix expandEnv regex, init script crash, healthcheck deadline, and test issues - Fix envRegex: remove spurious (\$)? group that swallowed $$$, handle ${} - wrenn-init.sh: add \|\| true to networking commands under set -e, remove dead code - waitForHealthcheck: use context deadline for unlimited retries instead of implicit 100 cap - Make parseSandboxEnv a package-level function (unused receiver) - Fix WrappedCommand test: map iteration order dependency, pre-expand env values - Fix error wrapping: %v → %w per project conventions - test-jupyter-kernel.py: move import to top-level, fix misleading comment	2026-04-08 02:14:53 +06:00
Rafeed M. Bhuiyan	bf05677bef	Merge branch 'dev' into feat/python-code-interpreter	2026-04-06 20:45:54 +00:00
Tasnim Kabir Sadik	4f340b8847	feat: add env expansion, sandbox env fetching, and configurable healthchecks Fix ENV instructions to expand $VAR references at set time using the current env state, preventing self-referencing values like PATH=/opt/venv/bin:$PATH from producing recursive expansions. Remove expandEnv from shellPrefix to avoid double expansion. Fetch sandbox environment variables via `env` before recipe execution so ENV steps resolve against actual runtime values from the base template image. Replace hardcoded healthcheck timing with a Dockerfile-like flag parser supporting --interval, --timeout, --start-period, and --retries. Add start-period grace window and bounded retry counting to waitForHealthcheck. Add python-interpreter-v0-beta recipe and healthcheck files.	2026-04-07 01:15:43 +06:00
pptx704	9a52b47786	Minor temporary fix for sitewide metrics	2026-04-04 13:11:18 +06:00
pptx704	948db13bed	Add skip_pre_post build option, cancel endpoint, and recipe package - skip_pre_post flag on builds bypasses apt update/clean pre/post steps for faster iteration when the recipe handles its own environment setup - POST /v1/admin/builds/{id}/cancel endpoint marks an in-progress build as cancelled; UpdateBuildStatus now also sets completed_at for 'cancelled' - internal/recipe: typed recipe parser and executor (RUN/ENV/COPY steps) replacing the raw string slice approach in the build worker - pre/post build commands prefixed with RUN to match recipe step format	2026-03-30 21:24:52 +06:00
pptx704	25ce0729d5	Add mTLS to CP→agent channel - Internal ECDSA P-256 CA (WRENN_CA_CERT/WRENN_CA_KEY env vars); when absent the system falls back to plain HTTP so dev mode works without certificates - Host leaf cert (7-day TTL, IP SAN) issued at registration and renewed on every JWT refresh; fingerprint + expiry stored in DB (cert_expires_at column replaces the removed mtls_enabled flag) - CP ephemeral client cert (24-hour TTL) via CPCertStore with atomic hot-swap; background goroutine renews it every 12 hours without restarting the server - Host agent uses tls.Listen + httpServer.Serve so GetCertificate callback is respected (ListenAndServeTLS always reads cert from disk) - Sandbox reverse proxy now uses pool.Transport() so it shares the same TLS config as the Connect RPC clients instead of http.DefaultTransport - Credentials file renamed host-credentials.json with cert_pem/key_pem/ ca_cert_pem fields; duplicate register/refresh response structs collapsed to authResponse	2026-03-30 21:24:35 +06:00
pptx704	75b28ed899	Add UUID-based template IDs and team-scoped template directory layout Introduces internal/layout package for centralized path construction, migrates templates from name-based TEXT primary keys to UUID PKs with team-scoped directories (WRENN_DIR/images/teams/{team_id}/{template_id}). The built-in minimal template uses sentinel zero UUIDs. Proto messages carry team_id + template_id alongside deprecated template name field. Team deletion now cleans up template files across all hosts.	2026-03-29 00:30:10 +06:00
pptx704	34af77e0d8	Fix snapshot race, delete auth, sparse dd, default disk to 5GB Snapshot race fix: - Pre-mark sandbox as "paused" in DB before issuing CreateSnapshot and PauseSandbox RPCs, preventing the reconciler from marking it "stopped" during the flatten window when the sandbox is gone from the host agent's in-memory map but DB still says "running" - Revert status to "running" on RPC failure - Check ctx.Err() before writing response to avoid writing to dead connections when client disconnects during long snapshot operations Delete auth fix: - Block non-admin deletion of platform templates (team_id = all-zeros) at DELETE /v1/snapshots/{name} with 403, preventing file deletion before the team ownership check fails Sparse dd: - Add conv=sparse to dd in FlattenSnapshot so flattened images preserve sparseness (~200MB actual vs 5GB logical) Default disk size: - Change default disk_size_mb from 20GB to 5GB across migration, manager, service, build, and EnsureImageSizes - Disable split-button dropdown arrow for platform templates in dashboard snapshots page (teams cannot delete platform templates)	2026-03-28 14:30:18 +06:00
pptx704	3509ca90e8	Add pre/post build stages, fix exec timeout, expand guest PATH Build phases: - Pre-build (apt update) and post-build (apt clean, autoremove, rm lists) run with 10-minute timeout; user recipe commands keep 30s timeout - Log entries include phase field for UI grouping - Always send explicit TimeoutSec to host agent (0 defaulted to 30s) Frontend: - Pre-build/post-build steps show phase label without exposing commands - Recipe steps numbered independently starting from 1 Guest PATH: - Add /usr/games:/usr/local/games to wrenn-init.sh PATH export (standard Ubuntu paths, needed for packages like cowsay)	2026-03-27 00:28:32 +06:00
pptx704	c8acac92cc	Add pre/post build stages to template builds Pre-build: apt update Post-build: apt clean, apt autoremove, rm apt lists Total steps count includes pre/post commands for accurate progress bars.	2026-03-27 00:00:48 +06:00
pptx704	c0d6381bbe	Add disk_size_mb, auto-expand base images, admin templates endpoint Disk sizing: - Add disk_size_mb column to sandboxes table (default 20480 = 20GB) - Add disk_size_mb to CreateSandboxRequest proto, passed through the full chain: service → RPC → host agent → sandbox manager → devicemapper - devicemapper.CreateSnapshot takes separate cowSizeBytes param so the sparse CoW file can be sized independently from the origin - EnsureImageSizes() runs at host agent startup: expands any base image smaller than 20GB via truncate + resize2fs (sparse, no extra physical disk). Sandboxes then get the full 20GB via fast dm-snapshot path - FlattenRootfs shrinks output images with resize2fs -M so stored templates are compact; EnsureImageSizes re-expands on next startup Admin templates visibility: - Add GET /v1/admin/templates endpoint listing all templates across teams - Frontend admin templates page uses listAdminTemplates() instead of team-scoped listSnapshots() - Platform templates (team_id = all-zeros UUID) now visible to all teams: GetTemplateByTeam, ListTemplatesByTeam, ListTemplatesByTeamAndType queries include platform team_id in WHERE clause	2026-03-26 23:45:41 +06:00
pptx704	4ddd494160	Switch database IDs from TEXT to native UUID Consolidate 16 migrations into one with UUID columns for all entity IDs. TEXT is kept only for polymorphic fields (audit_logs.actor_id, resource_id) and template names. The id package now generates UUIDs via google/uuid, with Format/Parse helpers for the prefixed wire format (sb-{uuid}, usr-{uuid}, etc.). Auth context, services, and handlers pass pgtype.UUID internally; conversion to/from prefixed strings happens at API and RPC boundaries. Adds PlatformTeamID (all-zeros UUID) for shared resources.	2026-03-26 16:16:21 +06:00
pptx704	cdd89a7cee	Fix review issues: detached contexts, loop device leak, timer leak, size_bytes - Use context.Background() with timeout in destroySandbox/failBuild so cleanup and DB writes survive parent context cancellation on shutdown - Fix loop device refcount leak in FlattenRootfs when dmDevice is nil - Replace time.After with time.NewTimer in healthcheck polling to avoid goroutine leak when healthcheck passes early - Capture size_bytes from CreateSnapshot/FlattenRootfs RPC responses instead of hardcoding 0 in the templates table insert - Avoid leaking internal error details to API clients in build handler	2026-03-26 15:31:38 +06:00
pptx704	1ce62934b3	Add template build system with admin panel, async workers, and FlattenRootfs RPC Introduces an end-to-end template building pipeline: admins submit a recipe (list of shell commands) via the dashboard, a Redis-backed worker pool spins up a sandbox, executes each command, and produces either a full snapshot (with healthcheck) or an image-only template (rootfs flattened via a new FlattenRootfs host-agent RPC). Build progress and per-step logs are persisted to a new template_builds table and polled by the frontend. Backend: - New FlattenRootfs RPC (proto + host agent + sandbox manager) - BuildService with Redis queue (BLPOP) and configurable worker pool (default 2) - Admin-only REST endpoints: POST/GET /v1/admin/builds, GET /v1/admin/builds/{id} - Migration for template_builds table with JSONB logs and recipe columns - sqlc queries for build CRUD and progress updates Frontend: - /admin/templates page with Templates + Builds tabs - Create Template dialog with recipe textarea, healthcheck, specs - Build history with expandable per-step logs, status badges, progress bars - Auto-polling every 3s for active builds - AdminSidebar updated with Templates nav item	2026-03-26 15:27:21 +06:00
pptx704	6eacf0f735	Fix LIKE pattern injection in user email search Escape LIKE metacharacters (% and _) in the email prefix before passing to the SQL query, and enforce the documented '@' requirement to prevent broad user enumeration. Move search logic out of TeamService into usersHandler since it is a site-wide lookup, not team-scoped.	2026-03-25 21:53:09 +06:00
pptx704	9acdbb5ae9	Add per-sandbox CPU/memory/disk metrics collection Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and stat() on CoW files at 500ms intervals per running sandbox. Three tiered ring buffers downsample into 30s and 5min averages for 10min/2h/24h retention. Metrics are flushed to DB on pause (all tiers) and destroy (24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on the control plane. Returns live data for running sandboxes, DB data for paused, and 404 for stopped.	2026-03-25 20:10:33 +06:00
pptx704	47b0ed5b52	Fix metrics correctness, redesign stats page - Replace stale snapshot read (GetCurrentMetrics) with live query (GetLiveMetrics) against sandboxes table — always returns correct zeros when no capsules are running - Fix CPU reserved formula: running + starting only; paused VMs no longer contribute vCPUs (RAM reservation for paused unchanged) - Merge top cards into 3 paired Now/Peak cards with colored accent borders (green/blue/amber matching chart colors) - Move Live badge from Running Capsules card to page-level header - Add colored category dots to card and chart headers - Charts stacked vertically, flex-1 to fill remaining page height - vCPUs chart color changed to blue (#5a9fd4), RAM stays amber	2026-03-25 15:11:46 +06:00
pptx704	fee66bda50	Add live stats page with metrics sampling and route split - New sandbox_metrics_snapshots table sampled every 10s (60-day retention) - Background MetricsSampler goroutine wired into control plane startup - GET /v1/sandboxes/stats?range=5m\|1h\|6h\|24h\|30d endpoint with adaptive polling intervals; reserved CPU/RAM uses ceil(paused/2) formula - StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight lines, integer y-axis for running count, dual-axis for CPU/RAM) - Range filter persisted in URL query param; polls update data silently (no blink — loading state only shown on initial mount) - Split /dashboard/capsules into /list and /stats sub-routes with shared layout; capsuleRunningCount store syncs badge across routes - CreateCapsuleDialog extracted as reusable component	2026-03-25 14:41:05 +06:00
pptx704	1be30034bd	Add audit log infrastructure and GET /v1/audit-logs endpoint Introduces an append-only audit trail for all user and system actions: sandbox lifecycle (create/pause/resume/destroy/auto-pause), snapshots, team rename, API key create/revoke, member add/remove/leave/role_update, and BYOC host add/delete/marked_down/marked_up. - New audit_logs table (migration) with team_id, actor, resource, action, scope (team\|admin), status (success\|info\|warning\|error), metadata, and created_at - AuditLogger (internal/audit) with named fire-and-forget methods per event; system actor used for background events (HostMonitor, TTL reaper) - GET /v1/audit-logs: JWT-only, cursor pagination (max 200), multi-value filters for resource_type and action (comma-sep or repeated params); members see team-scoped events only, admins/owners see all - AuthContext extended with APIKeyID + APIKeyName so API key requests record meaningful actor identity - HostMonitor wired with AuditLogger for auto-pause and host marked_down	2026-03-25 05:15:16 +06:00
pptx704	e069b3e679	Add BYOC page, admin section, and is_byoc team visibility gating - Frontend: BYOC hosts page (/dashboard/byoc) with register/delete flows, shimmer loading, pulsing online status, animated token reveal checkmark - Frontend: Admin section (/admin/hosts) with platform + BYOC tabs, stat pills, skeleton loading, slide-in animations for new rows - Frontend: AdminSidebar component with accent top bar and admin pill badge - Frontend: BYOC nav item shown only when team.is_byoc is true (derived from teams store, not JWT); disabled for members - Frontend: Admin shield button in Sidebar, visible only to platform admins - Backend: is_admin in JWT claims + requireAdmin middleware (DB-validated) - Backend: is_byoc added to teamResponse so frontend derives visibility from fresh team data rather than stale JWT fields - Backend: SetBYOC admin endpoint (PUT /v1/admin/teams/{id}/byoc) - Backend: Admin hosts list enriches BYOC entries with team_name - Host agent: load .env file via godotenv on startup	2026-03-25 03:10:41 +06:00
pptx704	9bf67aa7f7	Implement host registration, JWT refresh tokens, and multi-host scheduling Replaces the hardcoded CP_HOST_AGENT_ADDR single-agent setup with a DB-driven registration system supporting multiple host agents (BYOC). Key changes: - Host agents register via one-time token, receive a 7-day JWT + 60-day refresh token; heartbeat loop auto-refreshes on 401/403 and pauses all sandboxes if refresh fails - HostClientPool: lazy Connect RPC client cache keyed by host ID, replacing the single static agent client throughout the API and service layers - RoundRobinScheduler: picks an online host for each new sandbox via ListActiveHosts; extensible for future scheduling strategies - HostMonitor (replaces Reconciler): passive heartbeat staleness check marks hosts unreachable and sandboxes missing after 90s; active reconciliation per online host restores missing-but-alive sandboxes and stops orphans - Graceful host delete: returns 409 with affected sandbox list without ?force=true; force-delete destroys sandboxes then evicts pool client - Snapshot delete broadcasts to all online hosts (templates have no host_id) - sandbox.Manager.PauseAll: pauses all running VMs on CP connectivity loss - New migration: host_refresh_tokens table with token rotation (issue-then- revoke ordering to prevent lockout on mid-rotation crash) - New sandbox status 'missing' (reversible, unlike 'stopped') and host status 'unreachable'; both reflected in OpenAPI spec - Fix: refresh token auth failure now returns 401 (was 400 via generic 'invalid' substring match in serviceErrToHTTP)	2026-03-24 18:32:05 +06:00
pptx704	3932bc056e	Add user names, team-scoped sandbox guard, and login robustness fixes - Add name column to users (migration + sqlc regen); propagate through JWT claims, auth context, all auth/OAuth handlers, service layer, and frontend - Sidebar and team page show name instead of email; team page splits Name/Email into separate columns - Block sandbox creation in UI and API when user has no active team context - loginTeam helper falls back to first active team when no default is set, fixing login for invited users with no is_default membership - Exclude soft-deleted teams from GetDefaultTeamForUser, GetBYOCTeams queries - Guard host creation against soft-deleted teams in service/host.go - SwitchTeam re-fetches name from DB instead of trusting stale JWT claim - Reset teams store on login so stale data from a previous session never persists - Update openapi.yaml: add name to SignupRequest and AuthResponse schemas	2026-03-24 16:56:10 +06:00
pptx704	b3e8bdd171	Refine team management: name chars, danger zone, no-team state - Allow hyphens, @, and apostrophes in team names (backend regex) - After delete/leave, switch to next available team instead of logging out; if no teams remain, show a toast prompting to create one - Disable delete/leave button when user has only one team, with explanatory hint to create another team first - Show empty state on /dashboard/team when auth has no team context, pointing user to the sidebar to create a team - Fetch all teams in parallel with team detail on page load to power the isLastTeam guard	2026-03-24 14:34:20 +06:00
pptx704	8e5d426638	Add team management endpoints - Three-role model (owner/admin/member) with owner protection invariants - Team CRUD: create, rename (admin+), soft-delete with VM cleanup (owner only) - Member management: add by email, remove, role updates (admin+), leave - Switch-team endpoint re-issues JWT after DB membership verification - User email prefix search for add-member UI autocomplete - JWT carries role as a hint; all authorization decisions verified from DB - Team slug: immutable 12-char hex (e.g. a1b2c3-d1e2f3), reserved on soft-delete - Migration adds slug + deleted_at to teams; backfills existing rows	2026-03-24 13:29:54 +06:00
pptx704	5f0dbadea6	Fix snapshot and sandbox delete consistency - Snapshot delete: make agent RPC failure a hard error so DB record is not removed when files cannot be deleted from disk - Snapshot overwrite: call agent to delete old files before removing the DB record, preventing stale memfile.{uuid} generations from accumulating on disk across repeated overwrites - Sandbox destroy: only swallow CodeNotFound from the agent (sandbox already gone / TTL-reaped); any other error now propagates to the caller instead of being silently ignored	2026-03-23 02:59:30 +06:00
pptx704	97292ba0bf	Added basic frontend (#1 ) Reviewed-on: wrenn/sandbox#1 Co-authored-by: pptx704 <rafeed@omukk.dev> Co-committed-by: pptx704 <rafeed@omukk.dev>	2026-03-22 19:01:38 +00:00
pptx704	2c66959b92	Add host registration, heartbeat, and multi-host management Implements the full host ↔ control plane connection flow: - Host CRUD endpoints (POST/GET/DELETE /v1/hosts) with role-based access: regular hosts admin-only, BYOC hosts for admins and team owners - One-time registration token flow: admin creates host → gets token (1hr TTL in Redis + Postgres audit trail) → host agent registers with specs → gets long-lived JWT (1yr) - Host agent registration client with automatic spec detection (arch, CPU, memory, disk) and token persistence to disk - Periodic heartbeat (30s) via POST /v1/hosts/{id}/heartbeat with X-Host-Token auth and host ID cross-check - Token regeneration endpoint (POST /v1/hosts/{id}/token) for retry after failed registration - Tag management (add/remove/list) with team-scoped access control - Host JWT with typ:"host" claim, cross-use prevention in both VerifyJWT and VerifyHostJWT - requireHostToken middleware for host agent authentication - DB-level race protection: RegisterHost uses AND status='pending' with rows-affected check; Redis GetDel for atomic token consume - Migration for future mTLS support (cert_fingerprint, mtls_enabled columns) - Host agent flags: --register (one-time token), --address (required ip:port) - serviceErrToHTTP extended with "forbidden" → 403 mapping - OpenAPI spec, .env.example, and README updated	2026-03-17 05:51:28 +06:00
pptx704	f38d5812d1	Extract shared service layer for sandbox, API key, and template operations Moves business logic from API handlers into internal/service/ so that both the REST API and the upcoming dashboard can share the same operations without duplicating code. API handlers now delegate to the service layer and only handle HTTP-specific concerns (request parsing, response formatting).	2026-03-16 05:39:30 +06:00

28 Commits