wrenn-releases

Author	SHA1	Message	Date
pptx704	1178ab8b21	fix: accurate sandbox metrics and memory management Three issues fixed: 1. Memory metrics read host-side VmRSS of the Firecracker process, which includes guest page cache and never decreases. Replaced readMemRSS(fcPID) with readEnvdMemUsed(client) that queries envd's /metrics endpoint for guest-side total - MemAvailable. This matches neofetch and reflects actual process memory. 2. Added Firecracker balloon device (deflate_on_oom, 5s stats) and envd-side periodic page cache reclaimer (drop_caches when >80% used). Reclaimer is gated by snapshot_in_progress flag with sync() before freeze to prevent memory corruption during pause. 3. Sampling interval 500ms → 1s, ring buffer capacities adjusted to maintain same time windows. Reduces per-host HTTP load from 240 calls/sec to 120 calls/sec at 120 capsules. Also: maxDiffGenerations 8 → 1 (merge every re-pause since UFFD lazy-loads anyway), envd mem_used formula uses total - available.	2026-05-03 12:19:01 +06:00
pptx704	f328113a2a	rename guest hostname from "sandbox" to "capsule" Terminal prompt inside VMs now shows root@capsule instead of root@sandbox, aligning with user-facing "capsule" terminology.	2026-05-03 03:32:03 +06:00
pptx704	bd98610153	fix: sandbox network responsiveness under port-binding apps Running port-binding applications (Jupyter, http.server, NextJS) inside sandboxes caused severe PTY sluggishness and proxy navigation errors. Root cause: the CP sandbox proxy and Connect RPC pool shared a single HTTP transport. Heavy proxy traffic (Jupyter WebSocket, REST polling) interfered with PTY RPC streams via HTTP/2 flow control contention. Transport isolation (main fix): - Add dedicated proxy transport on CP (NewProxyTransport) with HTTP/2 disabled, separate from the RPC pool transport - Add dedicated proxy transport on host agent, replacing http.DefaultTransport - Add dedicated envdclient transport with tuned connection pooling - Replace http.DefaultClient in file streaming RPCs with per-sandbox envd client Proxy path rewriting (navigation fix): - Add ModifyResponse to rewrite Location headers with /proxy/{id}/{port} prefix, handling both root-relative and absolute-URL redirects - Strip prefix back out in CP subdomain proxy for correct browser behavior - Replace path.Join with string concat in CP Director to preserve trailing slashes (prevents redirect loops on directory listings) Proxy resilience: - Add dial retry with linear backoff (3 attempts) to handle socat startup delay when ports are first detected - Cache ReverseProxy instances per sandbox+port+host in sync.Map - Add EvictProxy callback wired into sandbox Manager.Destroy Buffer and server hardening: - Increase PTY and exec stream channel buffers from 16 to 256 - Add ReadHeaderTimeout (10s) and IdleTimeout (620s) to host agent HTTP server Network tuning: - Set TAP device TxQueueLen to 5000 (up from default 1000) - Add Firecracker tx_rate_limiter (200 MB/s sustained, 100 MB burst) to prevent guest traffic from saturating the TAP	2026-04-25 04:21:55 +06:00
pptx704	9ea847923c	Fix concurrency, security, and correctness issues across backend and frontend - C1: Add sync.RWMutex to vm.Manager to protect concurrent vms map access - H1: Fix IP arithmetic overflow in network slot addressing (byte truncation) - H5: Fix MultiplexedChannel.Fork() TOCTOU race (move exited check inside lock) - H8: Remove snapshot overwrite — return template_name_taken conflict instead - H9: Wrap DeleteAccount DB ops in a transaction, make team deletion fatal - H10: Sanitize serviceErrToHTTP to stop leaking internal error messages - H11: Add deleted_at IS NULL to GetUserByEmail/GetUserByID queries - H12: Add id DESC to audit log composite index for cursor pagination - H15: Delete dead AuthModal.svelte component - H17: Move JWT from WebSocket URL query param to first WS message - H18: Fix $derived to $derived.by in FilesTab breadcrumbs	2026-04-16 06:11:42 +06:00
pptx704	88f919c4ca	Rename sandbox prefix to cl-, add MMDS metadata, fix proxy port routing - Change sandbox ID prefix from sb- to cl- (capsule) throughout - Fix proxy URL regex character class: base36 uses 0-9a-z, not just hex - Add MMDS V2 config and metadata to VM boot flow so envd can read WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID from inside the guest - Pass TemplateID through VMConfig into both fresh and snapshot boot paths	2026-03-30 17:12:05 +06:00
pptx704	6898528096	Replace one-shot clock_settime with chrony for continuous guest time sync Switch from the envd /init endpoint pushing host time via syscall to chronyd reading the KVM PTP hardware clock (/dev/ptp0) continuously. This fixes clock drift between init calls and handles snapshot resume gracefully. Changes: - Add clocksource=kvm-clock kernel boot arg - Start chronyd in wrenn-init.sh before tini (PHC /dev/ptp0, makestep 1.0 -1) - Remove clock_settime logic from envd SetData and shouldSetSystemTime - Remove client.Init() clock sync calls from sandbox manager (3 sites) - Remove Init() method from envdclient (no longer needed) - Simplify rootfs scripts: socat/chrony now come from apt in the container image, only envd/wrenn-init/tini are injected by build scripts	2026-03-26 04:47:44 +06:00
pptx704	9acdbb5ae9	Add per-sandbox CPU/memory/disk metrics collection Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and stat() on CoW files at 500ms intervals per running sandbox. Three tiered ring buffers downsample into 30s and 5min averages for 10min/2h/24h retention. Metrics are flushed to DB on pause (all tiers) and destroy (24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on the control plane. Returns live data for running sandboxes, DB data for paused, and 404 for stopped.	2026-03-25 20:10:33 +06:00
pptx704	80a99eec87	Add diff snapshots for re-pause to avoid UFFD fault-in storm Use Firecracker's Diff snapshot type when re-pausing a previously resumed sandbox, capturing only dirty pages instead of a full memory dump. Chains up to 10 incremental generations before collapsing back to a Full snapshot. Multi-generation diff files (memfile.{buildID}) are supported alongside the legacy single-file format in resume, template creation, and snapshot existence checks.	2026-03-13 09:41:58 +06:00
pptx704	63e9132d38	Add device-mapper snapshots, test UI, fix pause ordering and lint errors - Replace reflink rootfs copy with device-mapper snapshots (shared read-only loop device per base template, per-sandbox sparse CoW file) - Add devicemapper package with create/restore/remove/flatten operations and refcounted LoopRegistry for base image loop devices - Fix pause ordering: destroy VM before removing dm-snapshot to avoid "device busy" error (FC must release the dm device first) - Add test UI at GET /test for sandbox lifecycle management (create, pause, resume, destroy, exec, snapshot create/list/delete) - Fix DirSize to report actual disk usage (stat.Blocks * 512) instead of apparent size, so sparse CoW files report correctly - Add timing logs to pause flow for performance diagnostics - Fix all lint errors across api, network, vm, uffd, and sandbox packages - Remove obsolete internal/filesystem package (replaced by devicemapper) - Update CLAUDE.md with device-mapper architecture documentation	2026-03-13 08:25:40 +06:00
pptx704	a1bd439c75	Add sandbox snapshot and restore with UFFD lazy memory loading Implement full snapshot lifecycle: pause (snapshot + free resources), resume (UFFD-based lazy restore), and named snapshot templates that can spawn new sandboxes from frozen VM state. Key changes: - Snapshot header system with generational diff mapping (inspired by e2b) - UFFD server for lazy page fault handling during snapshot restore - Stable rootfs symlink path (/tmp/fc-vm/) for snapshot compatibility - Templates DB table and CRUD API endpoints (POST/GET/DELETE /v1/snapshots) - CreateSnapshot/DeleteSnapshot RPCs in hostagent proto - Reconciler excludes paused sandboxes (expected absent from host agent) - Snapshot templates lock vcpus/memory to baked-in values - Proper cleanup of uffd sockets and pause snapshot files on destroy	2026-03-12 09:19:37 +06:00
pptx704	6f0c365d44	Add host agent RPC server with sandbox lifecycle management Implement the host agent as a Connect RPC server that orchestrates sandbox creation, destruction, pause/resume, and command execution. Includes sandbox manager with TTL-based reaper, network slot allocator, rootfs cloning, hostagent proto definition with generated stubs, and test/debug scripts. Fix Firecracker process lifetime bug where VM was tied to HTTP request context instead of background context.	2026-03-10 03:54:53 +06:00
pptx704	7753938044	Add host agent with VM lifecycle, TAP networking, and envd client Implements Phase 1: boot a Firecracker microVM, execute a command inside it via envd, and get the output back. Uses raw Firecracker HTTP API via Unix socket (not the Go SDK) for full control over the VM lifecycle. - internal/vm: VM manager with create/pause/resume/destroy, Firecracker HTTP client, process launcher with unshare + ip netns exec isolation - internal/network: per-sandbox network namespace with veth pair, TAP device, NAT rules, and IP forwarding - internal/envdclient: Connect RPC client for envd process/filesystem services with health check retry - cmd/host-agent: demo binary that boots a VM, runs "echo hello", prints output, and cleans up - proto/envd: canonical proto files with buf + protoc-gen-connect-go code generation - images/wrenn-init.sh: minimal PID 1 init script for guest VMs - CLAUDE.md: updated architecture to reflect TAP networking (not vsock) and Firecracker HTTP API (not Go SDK)	2026-03-10 00:06:47 +06:00
pptx704	bd78cc068c	Initial project structure for Wrenn Sandbox Set up directory layout, Makefiles, go.mod files, docker-compose, and empty placeholder files for all packages.	2026-03-09 17:22:47 +06:00

13 Commits