wrenn-releases

Author	SHA1	Message	Date
pptx704	7ef9a64613	fix: close stale TCP connections across snapshot/restore to prevent envd hangs After Firecracker snapshot restore, zombie TCP sockets from the previous session cause Go runtime corruption inside the guest VM, making envd unresponsive. This manifests as infinite loading in the file browser and terminal timeouts (524) in production (HTTP/2 + Cloudflare) but not locally. Four-part fix: - Add ServerConnTracker to envd that tracks connections via ConnState callback, closes idle connections and disables keep-alives before snapshot, then closes all pre-snapshot zombie connections on restore (while preserving post-restore connections like the /init request) - Split envdclient into timeout (2min) and streaming (no timeout) HTTP clients; use streaming client for file transfers and process RPCs - Close host-side idle envdclient connections before PrepareSnapshot so FIN packets propagate during the 3s quiesce window - Add StreamingHTTPClient() accessor; streaming file transfer handlers in hostagent use it instead of the timeout client	2026-05-02 05:19:37 +06:00
pptx704	f3ec626d58	Envd version bump	2026-05-01 14:59:37 +06:00
pptx704	339cd7bee1	fix: security and stability fixes from code review - Scope WebSocket auth bypass to only WS endpoints by restructuring routes into separate chi Groups. Non-WS routes no longer passthrough unauthenticated requests with spoofed Upgrade headers. Added optionalAPIKeyOrJWT middleware for WS routes (injects auth context from API key/JWT if present, passes through otherwise) and markAdminWS middleware for admin WS routes. - Fix nil pointer dereference in envd Handler.Wait() — p.tty.Close() was called unconditionally but p.tty is nil for non-PTY processes, crashing every non-PTY process exit. - Fix goroutine leak in sandbox Pause — stopSampler was never called, leaking one sampler goroutine per successful pause operation. - Decouple PTY WebSocket reads from RPC dispatch using a buffered channel to prevent backpressure-induced connection drops under fast typing. Includes input coalescing to reduce RPC call volume.	2026-04-24 15:48:38 +06:00
pptx704	9ea847923c	Fix concurrency, security, and correctness issues across backend and frontend - C1: Add sync.RWMutex to vm.Manager to protect concurrent vms map access - H1: Fix IP arithmetic overflow in network slot addressing (byte truncation) - H5: Fix MultiplexedChannel.Fork() TOCTOU race (move exited check inside lock) - H8: Remove snapshot overwrite — return template_name_taken conflict instead - H9: Wrap DeleteAccount DB ops in a transaction, make team deletion fatal - H10: Sanitize serviceErrToHTTP to stop leaking internal error messages - H11: Add deleted_at IS NULL to GetUserByEmail/GetUserByID queries - H12: Add id DESC to audit log composite index for cursor pagination - H15: Delete dead AuthModal.svelte component - H17: Move JWT from WebSocket URL query param to first WS message - H18: Fix $derived to $derived.by in FilesTab breadcrumbs	2026-04-16 06:11:42 +06:00
pptx704	a5ad3731f2	Refactored to maintain a separate cloud version Moves 12 packages from internal/ to pkg/ (config, id, validate, events, db, auth, lifecycle, scheduler, channels, audit, service) so they can be imported by the enterprise repo as a Go module dependency. Introduces pkg/cpextension (shared Extension interface + ServerContext) and pkg/cpserver (Run() entrypoint with functional options) so the enterprise main.go can call cpserver.Run(cpserver.WithExtensions(...)) without duplicating the 20-step server bootstrap. Adds db/migrations/embed.go for go:embed access to OSS SQL migrations from the enterprise module. cmd/control-plane/main.go is reduced to a 10-line wrapper around cpserver.Run.	2026-04-15 21:41:48 +06:00
pptx704	962860ba74	Pre-pause snapshot signal to prevent Go runtime crash on restore envd crashes with "fatal error: bad summary data" after Firecracker snapshot/restore because the page allocator radix tree is inconsistent when vCPUs are frozen mid-allocation. The port scanner goroutine allocates heavily every second, making it the primary trigger. Add POST /snapshot/prepare to envd — the host agent calls it before vm.Pause to quiesce continuous goroutines and force GC. On restore, PostInit restarts the port subsystem via the existing /init endpoint. - New PortSubsystem abstraction with Start/Stop/Restart lifecycle - Context-based goroutine cancellation (replaces irreversible channel close) - Context-aware Signal to prevent scanner/forwarder deadlock - Fix forwarder goroutine leak (was spinning forever on closed channel) - Kill socat children on stop to prevent orphans across snapshots - Fix double cmd.Wait panic (exec.Command instead of CommandContext)	2026-04-13 05:21:10 +06:00
pptx704	da06ecb97b	Fix file browser crash on non-regular files and connection leaks - envd: reject non-regular files (devices, pipes, sockets) in GetFiles to prevent infinite reads from /dev/zero, /dev/urandom etc. - host agent: add context cancellation check in ReadFileStream loop with proper Connect error codes - frontend: abort in-flight file reads on file switch, directory navigation, and component teardown via AbortController - frontend: guard against abort errors surfacing in UI, use try/finally for fileLoading state	2026-04-13 02:09:50 +06:00
pptx704	0189d030bb	Bump frontend and Go x/ dependencies - vite 7→8, @sveltejs/vite-plugin-svelte 6→7, typescript 5→6 - golang.org/x/crypto v0.49→v0.50, golang.org/x/sys v0.42→v0.43 (both modules)	2026-04-13 00:01:53 +06:00
pptx704	1826af37a5	Increase multiplexer fork buffer to 4096 to prevent output drops 64-entry buffer was too small for high-throughput PTY output (e.g. ls -laihR /). The consumer couldn't drain fast enough over the RPC stream, causing the non-blocking send fallback to silently discard data. 4096 entries (~64MB at 16KB/chunk) handles sustained output without drops while still preventing deadlock on stuck consumers.	2026-04-11 05:16:43 +06:00
pptx704	4b2ff279f7	Add terminal tab to capsule detail page and fix envd process lookup bugs - Add multi-session Terminal tab with xterm.js (session tabs, close, reconnect) - Keep terminal mounted across tab switches to preserve sessions - Persist active tab in URL (?tab=terminal) so refresh stays on terminal - Buffer keystrokes (50ms) to reduce per-character RPC overhead - Add WebSocket auth via ?token= query param for browser WS connections - Enable ws:true in Vite dev proxy for WebSocket support envd fixes (pre-existing bugs exposed by multi-session terminals): - Fix getProcess tag Range: inverted return values caused early stop when multiple tagged processes existed, making SendInput fail with "not found" - Fix multiplexer deadlock: blocking send to cancelled fork's unbuffered channel prevented process cleanup. Now uses buffered channels (cap 64) with non-blocking fallback	2026-04-11 04:27:16 +06:00
pptx704	37d85ec998	chore: relicense from BSL 1.1 to Apache 2.0 Replace Business Source License with Apache License Version 2.0 across LICENSE, envd/LICENSE, and NOTICE. Update NOTICE to remove BSL-era framing that singled out Apache-only portions.	2026-04-09 14:28:19 +06:00
pptx704	dd50cfdcb1	fix: security hardening from CSO audit - Add auth failure logging (login, API key, JWT) with IP/email/prefix - Move OAuth JWT from URL params to short-lived cookies to prevent token leakage via browser history, server logs, and Referer headers - Pin Swagger UI to v5.18.2 with SRI integrity hashes - Upgrade Go toolchain to 1.25.8 (fixes 5 called stdlib vulns) - Fix unchecked error in host agent credential refresh - Add .gstack to .gitignore for security report artifacts	2026-04-08 03:46:31 +06:00
pptx704	8b5fa3438e	Replace gopsutil port scanner with direct /proc/net/tcp reading The envd port scanner used gopsutil's net.Connections() which walks /proc/{pid}/fd to enumerate socket inodes. This corrupts Go runtime semaphore state when the VM is paused mid-operation and restored from a Firecracker snapshot. Replace with a direct /proc/net/tcp + /proc/net/tcp6 parser that reads a single file per address family — no /proc/{pid}/fd walk, no goroutines, no WaitGroups. Also replace concurrent-map (smap) in the scanner with a plain sync.RWMutex-protected map, since concurrent-map's Items() spawns goroutines with a WaitGroup internally, which is equally unsafe across snapshot boundaries. Use socket inode instead of PID for the port forwarding map key, since inode is available directly from /proc/net/tcp without the fd walk.	2026-04-01 15:47:28 +06:00
pptx704	4ddd494160	Switch database IDs from TEXT to native UUID Consolidate 16 migrations into one with UUID columns for all entity IDs. TEXT is kept only for polymorphic fields (audit_logs.actor_id, resource_id) and template names. The id package now generates UUIDs via google/uuid, with Format/Parse helpers for the prefixed wire format (sb-{uuid}, usr-{uuid}, etc.). Auth context, services, and handlers pass pgtype.UUID internally; conversion to/from prefixed strings happens at API and RPC boundaries. Adds PlatformTeamID (all-zeros UUID) for shared resources.	2026-03-26 16:16:21 +06:00
pptx704	6898528096	Replace one-shot clock_settime with chrony for continuous guest time sync Switch from the envd /init endpoint pushing host time via syscall to chronyd reading the KVM PTP hardware clock (/dev/ptp0) continuously. This fixes clock drift between init calls and handles snapshot resume gracefully. Changes: - Add clocksource=kvm-clock kernel boot arg - Start chronyd in wrenn-init.sh before tini (PHC /dev/ptp0, makestep 1.0 -1) - Remove clock_settime logic from envd SetData and shouldSetSystemTime - Remove client.Init() clock sync calls from sandbox manager (3 sites) - Remove Init() method from envdclient (no longer needed) - Simplify rootfs scripts: socat/chrony now come from apt in the container image, only envd/wrenn-init/tini are injected by build scripts	2026-03-26 04:47:44 +06:00
pptx704	a1bd439c75	Add sandbox snapshot and restore with UFFD lazy memory loading Implement full snapshot lifecycle: pause (snapshot + free resources), resume (UFFD-based lazy restore), and named snapshot templates that can spawn new sandboxes from frozen VM state. Key changes: - Snapshot header system with generational diff mapping (inspired by e2b) - UFFD server for lazy page fault handling during snapshot restore - Stable rootfs symlink path (/tmp/fc-vm/) for snapshot compatibility - Templates DB table and CRUD API endpoints (POST/GET/DELETE /v1/snapshots) - CreateSnapshot/DeleteSnapshot RPCs in hostagent proto - Reconciler excludes paused sandboxes (expected absent from host agent) - Snapshot templates lock vcpus/memory to baked-in values - Proper cleanup of uffd sockets and pause snapshot files on destroy	2026-03-12 09:19:37 +06:00
pptx704	ec3360d9ad	Add minimal control plane with REST API, database, and reconciler - REST API (chi router): sandbox CRUD, exec, pause/resume, file write/read - PostgreSQL persistence via pgx/v5 + sqlc (sandboxes table with goose migration) - Connect RPC client to host agent for all VM operations - Reconciler syncs host agent state with DB every 30s (detects TTL-reaped sandboxes) - OpenAPI 3.1 spec served at /openapi.yaml, Swagger UI at /docs - Added WriteFile/ReadFile RPCs to hostagent proto and implementations - File upload via multipart form, download via JSON body POST - sandbox_id propagated from control plane to host agent on create	2026-03-10 16:50:12 +06:00
pptx704	d7b25b0891	updated license structure	2026-03-10 04:32:29 +06:00
pptx704	34c89e814d	Added basic license information	2026-03-10 04:28:51 +06:00
pptx704	6f0c365d44	Add host agent RPC server with sandbox lifecycle management Implement the host agent as a Connect RPC server that orchestrates sandbox creation, destruction, pause/resume, and command execution. Includes sandbox manager with TTL-based reaper, network slot allocator, rootfs cloning, hostagent proto definition with generated stubs, and test/debug scripts. Fix Firecracker process lifetime bug where VM was tied to HTTP request context instead of background context.	2026-03-10 03:54:53 +06:00
pptx704	c31ce90306	Centralize envd proto source of truth to proto/envd/ Remove duplicate proto files from envd/spec/ and update envd's buf.gen.yaml to generate stubs from the canonical proto/envd/ location. Both modules now generate their own Connect RPC stubs from the same source protos.	2026-03-10 02:49:31 +06:00
pptx704	a3898d68fb	Port envd from e2b with internalized shared packages and Connect RPC - Copy envd source from e2b-dev/infra, internalize shared dependencies into envd/internal/shared/ (keys, filesystem, id, smap, utils) - Switch from gRPC to Connect RPC for all envd services - Update module paths to git.omukk.dev/wrenn/{sandbox,sandbox/envd} - Add proto specs (process, filesystem) with buf-based code generation - Implement full envd: process exec, filesystem ops, port forwarding, cgroup management, MMDS integration, and HTTP API - Update main module dependencies (firecracker SDK, pgx, goose, etc.) - Remove placeholder .gitkeep files replaced by real implementations	2026-03-09 21:03:19 +06:00
pptx704	bd78cc068c	Initial project structure for Wrenn Sandbox Set up directory layout, Makefiles, go.mod files, docker-compose, and empty placeholder files for all packages.	2026-03-09 17:22:47 +06:00

23 Commits