wrenn-releases

Author	SHA1	Message	Date
Rafeed M. Bhuiyan	d3e4812e46	v0.0.1 (#8 ) Co-authored-by: Tasnim Kabir Sadik <tksadik92@gmail.com> Reviewed-on: wrenn/sandbox#8	2026-04-09 19:24:49 +00:00
Rafeed M. Bhuiyan	831c898b71	Merge pull request 'Added channels for external notifications' (#13 ) from feat/channels into dev Reviewed-on: wrenn/sandbox#13	2026-04-09 19:20:36 +00:00
pptx704	0f78982186	feat: channel audit logging, name cleaning, message formatting, and dashboard UI - Add audit log entries for channel create, update, rotate_config, delete - Clean channel names on create/update (trim, lowercase, spaces → hyphens, SafeName validation) - Format chat notifications with full event details (resource, actor, team, timestamp) instead of one-liners - Fix Discord split-line embeds by setting splitLines=No on shoutrrr URL - Add channels dashboard page and sidebar navigation	2026-04-10 01:17:03 +06:00
pptx704	84dd15d22b	feat: add notification channels with provider integrations and retry Implement a channels system for notifying teams via external providers (Discord, Slack, Teams, Google Chat, Telegram, Matrix, webhook) when lifecycle events occur (capsule/template/host state changes). - Channel CRUD API under /v1/channels (JWT-only auth) - Test endpoint to verify config before saving (POST /v1/channels/test) - Secret rotation endpoint (PUT /v1/channels/{id}/config) - AES-256-GCM encryption for provider secrets (WRENN_ENCRYPTION_KEY) - Redis stream event publishing from audit logger - Background dispatcher with consumer group and retry (10s, 30s) - Webhook delivery with HMAC-SHA256 signing (X-WRENN-SIGNATURE) - shoutrrr integration for chat providers - Secrets never exposed in API responses	2026-04-09 17:06:06 +06:00
pptx704	5148b5dd64	Updated CLAUDE.md	2026-04-09 14:28:39 +06:00
pptx704	37d85ec998	chore: relicense from BSL 1.1 to Apache 2.0 Replace Business Source License with Apache License Version 2.0 across LICENSE, envd/LICENSE, and NOTICE. Update NOTICE to remove BSL-era framing that singled out Apache-only portions.	2026-04-09 14:28:19 +06:00
pptx704	e2beef817d	Expose host up/down audit events to BYOC teams and refresh dashboard navigation Change host marked_down/marked_up audit log scope from "admin" to "team" so BYOC team members can see when their hosts go unreachable or recover. Rename BYOC sidebar entry to Hosts, add placeholder billing/usage pages, disable unimplemented notifications/settings links, and point docs to external site.	2026-04-09 14:24:20 +06:00
pptx704	a9ca13b238	Changed redis dependency to keydb	2026-04-09 00:47:19 +06:00
pptx704	e3ffa576ce	Fix review findings: IP collision, pause race, proxy path, ENV ordering, conn drain - Fix IP address collision at slot 32768+ by using bitwise shifts instead of byte-truncating division in network slot addressing - Add per-sandbox lifecycleMu to serialize concurrent Pause/Destroy calls - Sanitize proxy forwarding path with path.Clean - Sort ENV keys in recipe shell preamble for deterministic ordering - Fix ConnTracker goroutine leak by adding cancel channel to Drain/Reset - Update context_test to assert deterministic ENV ordering	2026-04-08 04:32:41 +06:00
pptx704	dd50cfdcb1	fix: security hardening from CSO audit - Add auth failure logging (login, API key, JWT) with IP/email/prefix - Move OAuth JWT from URL params to short-lived cookies to prevent token leakage via browser history, server logs, and Referer headers - Pin Swagger UI to v5.18.2 with SRI integrity hashes - Upgrade Go toolchain to 1.25.8 (fixes 5 called stdlib vulns) - Fix unchecked error in host agent credential refresh - Add .gstack to .gitignore for security report artifacts	2026-04-08 03:46:31 +06:00
pptx704	3675ecba65	chore: add gstack skill routing rules to CLAUDE.md	2026-04-08 02:28:02 +06:00
pptx704	c8615466be	Enforce mandatory mTLS for CP↔agent communication Both the control plane and host agent now refuse to start without valid mTLS configuration, closing the unauthenticated proxy/RPC attack surface that existed when running in plain HTTP fallback mode.	2026-04-08 02:25:43 +06:00
Rafeed M. Bhuiyan	2737288a2b	Merge pull request 'Changes for a python code interpreter' (#12 ) from feat/python-code-interpreter into dev Reviewed-on: wrenn/sandbox#12	2026-04-07 20:18:06 +00:00
pptx704	0ea0e7cc70	Fix expandEnv regex, init script crash, healthcheck deadline, and test issues - Fix envRegex: remove spurious (\$)? group that swallowed $$$, handle ${} - wrenn-init.sh: add \|\| true to networking commands under set -e, remove dead code - waitForHealthcheck: use context deadline for unlimited retries instead of implicit 100 cap - Make parseSandboxEnv a package-level function (unused receiver) - Fix WrappedCommand test: map iteration order dependency, pre-expand env values - Fix error wrapping: %v → %w per project conventions - test-jupyter-kernel.py: move import to top-level, fix misleading comment	2026-04-08 02:14:53 +06:00
Rafeed M. Bhuiyan	11e08e5b96	Merge branch 'dev' into feat/python-code-interpreter	2026-04-07 19:35:55 +00:00
Rafeed M. Bhuiyan	4dc8cc3867	Removed incorrect example cert format	2026-04-07 19:35:26 +00:00
Tasnim Kabir Sadik	9852f96127	Modified `expandEnv` to use regex. Updated recipefile with test script to check code execution with state management	2026-04-07 22:56:56 +06:00
Rafeed M. Bhuiyan	bf05677bef	Merge branch 'dev' into feat/python-code-interpreter	2026-04-06 20:45:54 +00:00
Tasnim Kabir Sadik	4f340b8847	feat: add env expansion, sandbox env fetching, and configurable healthchecks Fix ENV instructions to expand $VAR references at set time using the current env state, preventing self-referencing values like PATH=/opt/venv/bin:$PATH from producing recursive expansions. Remove expandEnv from shellPrefix to avoid double expansion. Fetch sandbox environment variables via `env` before recipe execution so ENV steps resolve against actual runtime values from the base template image. Replace hardcoded healthcheck timing with a Dockerfile-like flag parser supporting --interval, --timeout, --start-period, and --retries. Add start-period grace window and bounded retry counting to waitForHealthcheck. Add python-interpreter-v0-beta recipe and healthcheck files.	2026-04-07 01:15:43 +06:00
Rafeed M. Bhuiyan	f57fe85492	Merge pull request 'Minor temporary fix for sitewide metrics' (#11 ) from patch/analytics into dev Reviewed-on: wrenn/sandbox#11	2026-04-04 07:11:49 +00:00
pptx704	9a52b47786	Minor temporary fix for sitewide metrics	2026-04-04 13:11:18 +06:00
Rafeed M. Bhuiyan	ab38c8372c	Merge pull request 'Feature: HTTP communication with sandbox' (#10 ) from code-interpreter into dev Reviewed-on: wrenn/sandbox#10	2026-04-02 17:41:07 +00:00
pptx704	8b5fa3438e	Replace gopsutil port scanner with direct /proc/net/tcp reading The envd port scanner used gopsutil's net.Connections() which walks /proc/{pid}/fd to enumerate socket inodes. This corrupts Go runtime semaphore state when the VM is paused mid-operation and restored from a Firecracker snapshot. Replace with a direct /proc/net/tcp + /proc/net/tcp6 parser that reads a single file per address family — no /proc/{pid}/fd walk, no goroutines, no WaitGroups. Also replace concurrent-map (smap) in the scanner with a plain sync.RWMutex-protected map, since concurrent-map's Items() spawns goroutines with a WaitGroup internally, which is equally unsafe across snapshot boundaries. Use socket inode instead of PID for the port forwarding map key, since inode is available directly from /proc/net/tcp without the fd walk.	2026-04-01 15:47:28 +06:00
pptx704	2b4c5e0176	Add pre-pause proxy connection drain and sandbox proxy caching Introduce ConnTracker (atomic.Bool + WaitGroup) to track in-flight proxy connections per sandbox. Before pausing a VM, the manager drains active connections with a 2s grace period, preventing Go runtime corruption inside the guest caused by stale TCP state surviving Firecracker snapshot/restore. Also add: - AcquireProxyConn on Manager for atomic lookup + connection tracking - Proxy cache (120s TTL) on CP SandboxProxyWrapper with single-query DB lookup (GetSandboxProxyTarget) to avoid two round-trips - Reset() on ConnTracker to re-enable connections if pause fails	2026-04-01 15:09:44 +06:00
pptx704	377e856c8f	Fix lint warnings: drop deprecated Name field from snapshot response, check errcheck in benchmark Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 21:28:57 +06:00
pptx704	948db13bed	Add skip_pre_post build option, cancel endpoint, and recipe package - skip_pre_post flag on builds bypasses apt update/clean pre/post steps for faster iteration when the recipe handles its own environment setup - POST /v1/admin/builds/{id}/cancel endpoint marks an in-progress build as cancelled; UpdateBuildStatus now also sets completed_at for 'cancelled' - internal/recipe: typed recipe parser and executor (RUN/ENV/COPY steps) replacing the raw string slice approach in the build worker - pre/post build commands prefixed with RUN to match recipe step format	2026-03-30 21:24:52 +06:00
pptx704	25ce0729d5	Add mTLS to CP→agent channel - Internal ECDSA P-256 CA (WRENN_CA_CERT/WRENN_CA_KEY env vars); when absent the system falls back to plain HTTP so dev mode works without certificates - Host leaf cert (7-day TTL, IP SAN) issued at registration and renewed on every JWT refresh; fingerprint + expiry stored in DB (cert_expires_at column replaces the removed mtls_enabled flag) - CP ephemeral client cert (24-hour TTL) via CPCertStore with atomic hot-swap; background goroutine renews it every 12 hours without restarting the server - Host agent uses tls.Listen + httpServer.Serve so GetCertificate callback is respected (ListenAndServeTLS always reads cert from disk) - Sandbox reverse proxy now uses pool.Transport() so it shares the same TLS config as the Connect RPC clients instead of http.DefaultTransport - Credentials file renamed host-credentials.json with cert_pem/key_pem/ ca_cert_pem fields; duplicate register/refresh response structs collapsed to authResponse	2026-03-30 21:24:35 +06:00
pptx704	88f919c4ca	Rename sandbox prefix to cl-, add MMDS metadata, fix proxy port routing - Change sandbox ID prefix from sb- to cl- (capsule) throughout - Fix proxy URL regex character class: base36 uses 0-9a-z, not just hex - Add MMDS V2 config and metadata to VM boot flow so envd can read WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID from inside the guest - Pass TemplateID through VMConfig into both fresh and snapshot boot paths	2026-03-30 17:12:05 +06:00
pptx704	8f06fc554a	Replace Full snapshot fallback with file-level diff merge Always use Firecracker Diff snapshots (fast, only changed pages) and merge diff files at the file level when the generation cap is reached. The previous approach used Firecracker's Full snapshot type which dumps all memory to disk and can timeout, losing all snapshot data on failure. Add snapshot.MergeDiffs() which reads each block from the appropriate generation's diff file via the header mapping and writes them into a single consolidated file with a fresh generation-0 header.	2026-03-29 02:33:33 +06:00
pptx704	1ca10230a9	Prefix network namespaces with wrenn-, add stale cleanup, lower diff cap Rename ns-{idx} to wrenn-ns-{idx} and veth-{idx} to wrenn-veth-{idx} to avoid collisions with other tools. Add CleanupStaleNamespaces() at agent startup to remove orphaned namespaces, veths, iptables rules, and routes from a previous crash. Lower maxDiffGenerations from 10 to 8 to prevent Go runtime memory corruption from snapshot/restore drift.	2026-03-29 02:14:30 +06:00
pptx704	46d60fc5a5	Seed minimal template in DB and protect it from deletion Insert a minimal template row (all-zeros UUID) so it appears in both team and admin template listings. Guard delete endpoints to prevent removal of the minimal template.	2026-03-29 01:34:54 +06:00
pptx704	906cc42d13	Rename AGENT_/CP_LISTEN_ADDR env vars to WRENN_ prefix AGENT_FILES_ROOTDIR → WRENN_DIR, AGENT_LISTEN_ADDR → WRENN_HOST_LISTEN_ADDR, AGENT_CP_URL → WRENN_CP_URL, AGENT_HOST_INTERFACE → WRENN_HOST_INTERFACE, CP_LISTEN_ADDR → WRENN_CP_LISTEN_ADDR. Consolidates all env vars under a consistent WRENN_ namespace.	2026-03-29 00:30:20 +06:00
pptx704	75b28ed899	Add UUID-based template IDs and team-scoped template directory layout Introduces internal/layout package for centralized path construction, migrates templates from name-based TEXT primary keys to UUID PKs with team-scoped directories (WRENN_DIR/images/teams/{team_id}/{template_id}). The built-in minimal template uses sentinel zero UUIDs. Proto messages carry team_id + template_id alongside deprecated template name field. Team deletion now cleans up template files across all hosts.	2026-03-29 00:30:10 +06:00
pptx704	03e96629c7	Remove slug from team page UI	2026-03-28 20:45:57 +06:00
pptx704	34af77e0d8	Fix snapshot race, delete auth, sparse dd, default disk to 5GB Snapshot race fix: - Pre-mark sandbox as "paused" in DB before issuing CreateSnapshot and PauseSandbox RPCs, preventing the reconciler from marking it "stopped" during the flatten window when the sandbox is gone from the host agent's in-memory map but DB still says "running" - Revert status to "running" on RPC failure - Check ctx.Err() before writing response to avoid writing to dead connections when client disconnects during long snapshot operations Delete auth fix: - Block non-admin deletion of platform templates (team_id = all-zeros) at DELETE /v1/snapshots/{name} with 403, preventing file deletion before the team ownership check fails Sparse dd: - Add conv=sparse to dd in FlattenSnapshot so flattened images preserve sparseness (~200MB actual vs 5GB logical) Default disk size: - Change default disk_size_mb from 20GB to 5GB across migration, manager, service, build, and EnsureImageSizes - Disable split-button dropdown arrow for platform templates in dashboard snapshots page (teams cannot delete platform templates)	2026-03-28 14:30:18 +06:00
pptx704	c89a664a37	Switch API ID format from UUID to base36 for compact, E2B-style IDs DB stays native UUID; the format/parse layer now encodes 16 UUID bytes as 25-char lowercase alphanumeric (base36) strings instead of the standard 36-char hex-with-dashes format. e.g. sb-2e5glxi4g3qnhwci95qev0cg0	2026-03-27 00:53:51 +06:00
pptx704	3509ca90e8	Add pre/post build stages, fix exec timeout, expand guest PATH Build phases: - Pre-build (apt update) and post-build (apt clean, autoremove, rm lists) run with 10-minute timeout; user recipe commands keep 30s timeout - Log entries include phase field for UI grouping - Always send explicit TimeoutSec to host agent (0 defaulted to 30s) Frontend: - Pre-build/post-build steps show phase label without exposing commands - Recipe steps numbered independently starting from 1 Guest PATH: - Add /usr/games:/usr/local/games to wrenn-init.sh PATH export (standard Ubuntu paths, needed for packages like cowsay)	2026-03-27 00:28:32 +06:00
pptx704	c8acac92cc	Add pre/post build stages to template builds Pre-build: apt update Post-build: apt clean, apt autoremove, rm apt lists Total steps count includes pre/post commands for accurate progress bars.	2026-03-27 00:00:48 +06:00
pptx704	5cb37bf2a0	Add admin template deletion with broadcast to all hosts - DELETE /v1/admin/templates/{name} endpoint (admin-only) - Broadcasts DeleteSnapshot RPC to all online hosts before removing DB record - Frontend admin templates page uses deleteAdminTemplate() instead of team-scoped deleteSnapshot() - Delete button shown for all template types, not just snapshots	2026-03-26 23:53:08 +06:00
pptx704	c0d6381bbe	Add disk_size_mb, auto-expand base images, admin templates endpoint Disk sizing: - Add disk_size_mb column to sandboxes table (default 20480 = 20GB) - Add disk_size_mb to CreateSandboxRequest proto, passed through the full chain: service → RPC → host agent → sandbox manager → devicemapper - devicemapper.CreateSnapshot takes separate cowSizeBytes param so the sparse CoW file can be sized independently from the origin - EnsureImageSizes() runs at host agent startup: expands any base image smaller than 20GB via truncate + resize2fs (sparse, no extra physical disk). Sandboxes then get the full 20GB via fast dm-snapshot path - FlattenRootfs shrinks output images with resize2fs -M so stored templates are compact; EnsureImageSizes re-expands on next startup Admin templates visibility: - Add GET /v1/admin/templates endpoint listing all templates across teams - Frontend admin templates page uses listAdminTemplates() instead of team-scoped listSnapshots() - Platform templates (team_id = all-zeros UUID) now visible to all teams: GetTemplateByTeam, ListTemplatesByTeam, ListTemplatesByTeamAndType queries include platform team_id in WHERE clause	2026-03-26 23:45:41 +06:00
pptx704	4ddd494160	Switch database IDs from TEXT to native UUID Consolidate 16 migrations into one with UUID columns for all entity IDs. TEXT is kept only for polymorphic fields (audit_logs.actor_id, resource_id) and template names. The id package now generates UUIDs via google/uuid, with Format/Parse helpers for the prefixed wire format (sb-{uuid}, usr-{uuid}, etc.). Auth context, services, and handlers pass pgtype.UUID internally; conversion to/from prefixed strings happens at API and RPC boundaries. Adds PlatformTeamID (all-zeros UUID) for shared resources.	2026-03-26 16:16:21 +06:00
pptx704	cdd89a7cee	Fix review issues: detached contexts, loop device leak, timer leak, size_bytes - Use context.Background() with timeout in destroySandbox/failBuild so cleanup and DB writes survive parent context cancellation on shutdown - Fix loop device refcount leak in FlattenRootfs when dmDevice is nil - Replace time.After with time.NewTimer in healthcheck polling to avoid goroutine leak when healthcheck passes early - Capture size_bytes from CreateSnapshot/FlattenRootfs RPC responses instead of hardcoding 0 in the templates table insert - Avoid leaking internal error details to API clients in build handler	2026-03-26 15:31:38 +06:00
pptx704	1ce62934b3	Add template build system with admin panel, async workers, and FlattenRootfs RPC Introduces an end-to-end template building pipeline: admins submit a recipe (list of shell commands) via the dashboard, a Redis-backed worker pool spins up a sandbox, executes each command, and produces either a full snapshot (with healthcheck) or an image-only template (rootfs flattened via a new FlattenRootfs host-agent RPC). Build progress and per-step logs are persisted to a new template_builds table and polled by the frontend. Backend: - New FlattenRootfs RPC (proto + host agent + sandbox manager) - BuildService with Redis queue (BLPOP) and configurable worker pool (default 2) - Admin-only REST endpoints: POST/GET /v1/admin/builds, GET /v1/admin/builds/{id} - Migration for template_builds table with JSONB logs and recipe columns - sqlc queries for build CRUD and progress updates Frontend: - /admin/templates page with Templates + Builds tabs - Create Template dialog with recipe textarea, healthcheck, specs - Build history with expandable per-step logs, status badges, progress bars - Auto-polling every 3s for active builds - AdminSidebar updated with Templates nav item	2026-03-26 15:27:21 +06:00
pptx704	6898528096	Replace one-shot clock_settime with chrony for continuous guest time sync Switch from the envd /init endpoint pushing host time via syscall to chronyd reading the KVM PTP hardware clock (/dev/ptp0) continuously. This fixes clock drift between init calls and handles snapshot resume gracefully. Changes: - Add clocksource=kvm-clock kernel boot arg - Start chronyd in wrenn-init.sh before tini (PHC /dev/ptp0, makestep 1.0 -1) - Remove clock_settime logic from envd SetData and shouldSetSystemTime - Remove client.Init() clock sync calls from sandbox manager (3 sites) - Remove Init() method from envdclient (no longer needed) - Simplify rootfs scripts: socat/chrony now come from apt in the container image, only envd/wrenn-init/tini are injected by build scripts	2026-03-26 04:47:44 +06:00
pptx704	12d1e356fa	Minor UI copy updates across capsules and templates pages	2026-03-26 03:58:12 +06:00
pptx704	139f86bf9c	Fix static build: disable prerender for dynamic capsule detail route The [id] route cannot be prerendered at build time since IDs are unknown. With adapter-static's index.html fallback, the route is handled client-side.	2026-03-26 02:13:12 +06:00
pptx704	b0a8b498a8	WIP: Add Caddy reverse proxy for dev environment Add Caddy to docker-compose as the single entry point on port 8000: - localhost -> /api/* stripped and proxied to CP:8080, /* to frontend:5173 - .localhost -> proxied to CP:8080 (sandbox proxy catch-all) - Direct /v1/, /auth/*, /docs routes proxied to CP Move CP from :8000 to :8080 (its default). Caddy takes :8000. Update .env.example, vite proxy target (kept as fallback), and Makefile dev targets (pg_isready via docker exec, frontend binds 0.0.0.0). This is an intermediate state — needs further work for the full code interpreter feature.	2026-03-26 02:12:21 +06:00
pptx704	4be65b0abb	WIP: Add sandbox proxy catch-all to control plane Add SandboxProxyWrapper that intercepts requests with Host headers matching {port}-{sandbox_id}.{domain} and proxies them through the owning host agent's /proxy endpoint. Authentication is via X-API-Key only (no JWT). The API key's team must own the sandbox. Export EnsureScheme from lifecycle package for reuse. Request flow: SDK -> Caddy -> CP catch-all -> Host Agent -> sandbox VM. This is an intermediate state — needs further work for the full code interpreter feature.	2026-03-26 02:12:10 +06:00
pptx704	f4675ebfc0	WIP: Add HTTP proxy endpoint to host agent Add /proxy/{sandbox_id}/{port}/* handler that reverse-proxies HTTP requests to services running inside sandbox VMs. The sandbox's host IP (10.11.0.{idx}) is used as the upstream target. Includes port validation (1-65535) and shared HTTP transport for connection pooling. Supports WebSocket upgrades for protocols like Jupyter's streaming API. This is an intermediate state — needs further work for the full code interpreter feature.	2026-03-26 02:12:01 +06:00
pptx704	602ee470d9	WIP: Add socat injection to rootfs build scripts Inject a statically-linked socat binary into rootfs images. envd's port forwarder requires socat to bridge localhost-listening services (e.g. Jupyter kernel) to the guest TAP interface. Both scripts follow the same 3-step resolution: check rootfs, check host, build from source (http://www.dest-unreach.org/socat/ v1.8.1.1). Static linkage is verified before injection. This is an intermediate state — needs further work for the full code interpreter feature.	2026-03-26 02:11:54 +06:00

... 2 3 4 5 6

271 Commits