wrenn-releases

Author	SHA1	Message	Date
pptx704	01819642cc	fix: drop page cache before snapshot to reduce memory dump size Linux keeps freed memory as page cache, which Firecracker snapshots as non-zero blocks. A 16GB VM with 12GB stale cache would write all 12GB to disk. Dropping pagecache (not dentries/inodes) in /snapshot/prepare before blocking the reclaimer shrinks snapshots to actual working set size with minimal resume latency impact.	2026-05-03 14:27:49 +06:00
Rafeed M. Bhuiyan	cb28f7759d	Merge pull request 'fix: accurate sandbox metrics and memory management' (#41 ) from bugfix/sandbox-metrics-calculations into dev Reviewed-on: wrenn/wrenn#41	2026-05-03 06:41:41 +00:00
pptx704	1178ab8b21	fix: accurate sandbox metrics and memory management Three issues fixed: 1. Memory metrics read host-side VmRSS of the Firecracker process, which includes guest page cache and never decreases. Replaced readMemRSS(fcPID) with readEnvdMemUsed(client) that queries envd's /metrics endpoint for guest-side total - MemAvailable. This matches neofetch and reflects actual process memory. 2. Added Firecracker balloon device (deflate_on_oom, 5s stats) and envd-side periodic page cache reclaimer (drop_caches when >80% used). Reclaimer is gated by snapshot_in_progress flag with sync() before freeze to prevent memory corruption during pause. 3. Sampling interval 500ms → 1s, ring buffer capacities adjusted to maintain same time windows. Reduces per-host HTTP load from 240 calls/sec to 120 calls/sec at 120 capsules. Also: maxDiffGenerations 8 → 1 (merge every re-pause since UFFD lazy-loads anyway), envd mem_used formula uses total - available.	2026-05-03 12:19:01 +06:00
pptx704	233e747d5d	Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev	2026-05-03 04:56:14 +06:00
Rafeed M. Bhuiyan	f5a23c1fa0	v0.1.5 (#40 ) Reviewed-on: wrenn/wrenn#40 v0.1.5	2026-05-02 22:56:00 +00:00
Rafeed M. Bhuiyan	20a228eb8d	Merge pull request 'Rewritten envd with rust to improve reliability during pause and resume operations' (#39 ) from feat/envd-rewrite into dev Reviewed-on: wrenn/wrenn#39	2026-05-02 22:49:36 +00:00
pptx704	ef5f223863	fix: improve error feedback for terminal disconnects and host unavailability Show "[session disconnected]" in terminal when PTY websocket closes cleanly. Map scheduler and agent unavailability errors to 503 with user-friendly message instead of leaking internal details.	2026-05-03 04:47:10 +06:00
pptx704	31456fd169	fix: resolve PTY failure, MMDS file writes, and metrics instability in envd-rs Three bugs fixed: 1. PTY connections failed because home directory was hardcoded as /home/{username} instead of reading from /etc/passwd. For root, this produced /home/root/ which doesn't exist — CWD validation rejected every PTY Start request without explicit cwd. Fixed all 6 locations to use user.dir from nix::unistd::User. 2. MMDS polling silently failed to parse metadata because the logs_collector_address field lacked #[serde(default)]. The host agent only sends instanceID + envID — missing "address" field caused every deserialize attempt to fail, so .WRENN_SANDBOX_ID and .WRENN_TEMPLATE_ID were never written. Also added error logging and create_dir_all before file writes. 3. Metrics CPU values were non-deterministic because a fresh sysinfo::System was created per request with a 100ms sleep between reads. Replaced with a background thread that samples CPU at fixed 1-second intervals via a persistent System instance, matching gopsutil's internal caching behavior. Metrics endpoint now reads cached atomic values — no blocking, consistent window. Also: close master PTY fd in child pre_exec, add process.Start request logging, bump version to 0.2.0.	2026-05-03 04:28:10 +06:00
pptx704	bbcde17d49	Updated static link check for envd	2026-05-03 03:32:41 +06:00
pptx704	f328113a2a	rename guest hostname from "sandbox" to "capsule" Terminal prompt inside VMs now shows root@capsule instead of root@sandbox, aligning with user-facing "capsule" terminology.	2026-05-03 03:32:03 +06:00
pptx704	1143acd37a	refactor: remove Go envd module, update host agent for Rust envd The Go envd guest agent (`envd/`) is fully replaced by the Rust implementation (`envd-rs/`). This commit removes the Go module and updates all references across the codebase. Makefile: remove ENVD_DIR, VERSION_ENVD, build-envd-go, dev-envd-go, and Go envd from proto/fmt/vet/tidy/clean targets. Add static-link verification to build-envd. Host agent: rewrite snapshot quiesce comments that referenced Go GC and page allocator corruption — no longer applicable with Rust envd. Tighten envdclient to expect HTTP 200 (not 204) from health and file upload endpoints, and require JSON version response from FetchVersion. Remove NOTICE (no e2b-derived code remains). Update CLAUDE.md and README.md to reflect Rust envd architecture.	2026-05-03 03:12:25 +06:00
pptx704	0b53d34417	feat: rewrite envd guest agent in Rust (envd-rs) Complete Rust rewrite of the Go envd guest daemon that runs as PID 1 inside Firecracker microVMs. Feature-complete across all 8 phases: - Health, metrics, and env var endpoints - Crypto (SHA-256/512, HMAC), auth (secure token, signing), init/snapshot - Connect RPC via connectrpc + buffa (process + filesystem services) - File transfer (GET/POST /files) with gzip, multipart, chown, ENOSPC - Port subsystem (/proc/net/tcp scanner, socat forwarder) - Cgroup2 manager with noop fallback - Snapshot/restore lifecycle (conntracker, port subsystem stop/restart) - SIGTERM graceful shutdown, --cmd initial process spawn - MMDS metadata polling for Firecracker mode 42 source files, ~4200 LOC, 4.1MB stripped release binary. Makefile updated: build-envd now targets Rust (musl static), build-envd-go preserved for Go builds.	2026-05-03 02:47:15 +06:00
pptx704	3deecbff89	fix: prevent Go runtime memory corruption and sandbox halt after snapshot restore Three root causes addressed: 1. Go page allocator corruption: allocations between the pre-snapshot GC and VM freeze leave the summary tree inconsistent. After restore, GC reads corrupted metadata — either panicking (killing PID 1 → kernel panic) or silently failing to collect, causing unbounded heap growth until OOM. Fix: move GC to after all HTTP allocations in PostSnapshotPrepare, then set GOMAXPROCS(1) so any remaining allocations run sequentially with no concurrent page allocator access. GOMAXPROCS is restored on first health check after restore. 2. PostInit timeout starvation: WaitUntilReady and PostInit shared a single 30s context. If WaitUntilReady consumed most of it, PostInit failed — RestoreAfterSnapshot never ran, leaving envd with keep-alives disabled and zombie connections. Fix: separate timeout contexts. 3. CP HTTP server missing timeouts: no ReadHeaderTimeout or IdleTimeout caused goroutine leaks from hung proxy connections. Fix: add both, matching host agent values. Also adds UFFD prefetch to proactively load all guest pages after restore, eliminating on-demand page fault latency for subsequent RPC calls.	2026-05-02 17:22:51 +06:00
pptx704	bb582deefa	fix: prevent sandbox halt after resume by fixing HTTP/2 HOL blocking and adding timeouts Disable HTTP/2 on both host agent server and CP→agent transport — multiplexing caused head-of-line blocking when a slow sandbox RPC stalled the shared connection. Add ResponseHeaderTimeout to envd HTTP clients. Merge SetDefaults into Resume's PostInit call to eliminate an extra round-trip that could hang on a stale connection.	2026-05-02 13:48:51 +06:00
pptx704	7ef9a64613	fix: close stale TCP connections across snapshot/restore to prevent envd hangs After Firecracker snapshot restore, zombie TCP sockets from the previous session cause Go runtime corruption inside the guest VM, making envd unresponsive. This manifests as infinite loading in the file browser and terminal timeouts (524) in production (HTTP/2 + Cloudflare) but not locally. Four-part fix: - Add ServerConnTracker to envd that tracks connections via ConnState callback, closes idle connections and disables keep-alives before snapshot, then closes all pre-snapshot zombie connections on restore (while preserving post-restore connections like the /init request) - Split envdclient into timeout (2min) and streaming (no timeout) HTTP clients; use streaming client for file transfers and process RPCs - Close host-side idle envdclient connections before PrepareSnapshot so FIN packets propagate during the 3s quiesce window - Add StreamingHTTPClient() accessor; streaming file transfer handlers in hostagent use it instead of the timeout client	2026-05-02 05:19:37 +06:00
pptx704	f3572f7356	Fix empty WRENN_TEMPLATE_ID after resuming paused sandbox Resume() was building VMConfig without TemplateID, so Firecracker MMDS received an empty string. envd's PostInit then wrote that empty value to /run/wrenn/.WRENN_TEMPLATE_ID. Fix by persisting the template ID in snapshot metadata during Pause and reading it back during Resume.	2026-05-02 04:57:08 +06:00
pptx704	2e998a26a2	Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev	2026-05-01 15:01:32 +06:00
pptx704	4fcc19e91f	v0.1.4 (#38 ) Reviewed-on: wrenn/wrenn#38 Co-authored-by: pptx704 <rafeed@omukk.dev> Co-committed-by: pptx704 <rafeed@omukk.dev>	2026-05-01 09:01:08 +00:00
pptx704	f3ec626d58	Envd version bump	2026-05-01 14:59:37 +06:00
pptx704	f4733e2f7a	Version bump	2026-04-25 04:49:17 +06:00
Rafeed M. Bhuiyan	cdacc12a48	Merge pull request 'Fixed network throttle when an application is running' (#37 ) from fix/network-throttle-on-load into dev Reviewed-on: wrenn/wrenn#37	2026-04-24 22:43:31 +00:00
pptx704	bd98610153	fix: sandbox network responsiveness under port-binding apps Running port-binding applications (Jupyter, http.server, NextJS) inside sandboxes caused severe PTY sluggishness and proxy navigation errors. Root cause: the CP sandbox proxy and Connect RPC pool shared a single HTTP transport. Heavy proxy traffic (Jupyter WebSocket, REST polling) interfered with PTY RPC streams via HTTP/2 flow control contention. Transport isolation (main fix): - Add dedicated proxy transport on CP (NewProxyTransport) with HTTP/2 disabled, separate from the RPC pool transport - Add dedicated proxy transport on host agent, replacing http.DefaultTransport - Add dedicated envdclient transport with tuned connection pooling - Replace http.DefaultClient in file streaming RPCs with per-sandbox envd client Proxy path rewriting (navigation fix): - Add ModifyResponse to rewrite Location headers with /proxy/{id}/{port} prefix, handling both root-relative and absolute-URL redirects - Strip prefix back out in CP subdomain proxy for correct browser behavior - Replace path.Join with string concat in CP Director to preserve trailing slashes (prevents redirect loops on directory listings) Proxy resilience: - Add dial retry with linear backoff (3 attempts) to handle socat startup delay when ports are first detected - Cache ReverseProxy instances per sandbox+port+host in sync.Map - Add EvictProxy callback wired into sandbox Manager.Destroy Buffer and server hardening: - Increase PTY and exec stream channel buffers from 16 to 256 - Add ReadHeaderTimeout (10s) and IdleTimeout (620s) to host agent HTTP server Network tuning: - Set TAP device TxQueueLen to 5000 (up from default 1000) - Add Firecracker tx_rate_limiter (200 MB/s sustained, 100 MB burst) to prevent guest traffic from saturating the TAP	2026-04-25 04:21:55 +06:00
pptx704	5e13879954	fix: OAuth ConnectProvider state HMAC format mismatch ConnectProvider computed HMAC over bare state, but Callback always verifies HMAC(state+":"+intent). This caused the account-linking flow to always fail with invalid_state.	2026-04-25 02:00:39 +06:00
pptx704	339cd7bee1	fix: security and stability fixes from code review - Scope WebSocket auth bypass to only WS endpoints by restructuring routes into separate chi Groups. Non-WS routes no longer passthrough unauthenticated requests with spoofed Upgrade headers. Added optionalAPIKeyOrJWT middleware for WS routes (injects auth context from API key/JWT if present, passes through otherwise) and markAdminWS middleware for admin WS routes. - Fix nil pointer dereference in envd Handler.Wait() — p.tty.Close() was called unconditionally but p.tty is nil for non-PTY processes, crashing every non-PTY process exit. - Fix goroutine leak in sandbox Pause — stopSampler was never called, leaking one sampler goroutine per successful pause operation. - Decouple PTY WebSocket reads from RPC dispatch using a buffered channel to prevent backpressure-induced connection drops under fast typing. Includes input coalescing to reduce RPC call volume.	2026-04-24 15:48:38 +06:00
pptx704	153a54fdcd	Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev	2026-04-21 16:11:59 +06:00
Rafeed M. Bhuiyan	52ad21c339	v0.1.3 (#36 ) ## What's new Compliance, audit, and account lifecycle improvements — admin actions are now fully auditable, user data is properly anonymized on deletion, and OAuth signup flow gives users control over their profile. ### Audit - Added audit logging for all admin actions (user activate/deactivate, team BYOC toggle, team delete, template delete, build create/cancel) - Added admin audit page with infinite scroll and hierarchical filters - Fixed audit log team assignment — admin/host actions now correctly land under PlatformTeamID - Anonymize audit logs on user hard-delete (actor name, IDs, emails stripped) - Deduplicated audit logger internals (665 → 374 lines, no behavior change) ### Authentication - Separated GitHub OAuth login/signup flows — login no longer auto-creates accounts - Added name confirmation dialog for new GitHub signups ### Account Lifecycle - Email notification sent when account is permanently deleted after grace period - Audit log anonymization tied to user purge (per-user transactional) ### UX - Removed accent gradient bars from admin host dialogs (border + shadow only) - Frontend renders deleted users as styled badge in audit log view ### Others - Version bump - Bug fixes Reviewed-on: wrenn/wrenn#36 v0.1.3	2026-04-21 10:11:49 +00:00
Rafeed M. Bhuiyan	c3afd0c8a0	Merge pull request 'Audit logging, Data anonymization, and OAuth flow improvements' (#35 ) from feat/compliance into dev Reviewed-on: wrenn/wrenn#35	2026-04-21 10:09:37 +00:00
pptx704	11928a172a	feat: send email notification on account hard-delete Notify users via email when their account is permanently deleted after the 15-day soft-delete grace period. Query now returns email alongside user ID so the notification can be sent after deletion. Email failure is logged as a warning but does not block cleanup.	2026-04-21 16:01:56 +06:00
pptx704	bb2146d838	refactor: deduplicate audit logger with shared entry builders Replace repetitive actorFields + write boilerplate across all 25+ typed Log methods with shared helpers: newEntry (general), newAdminEntry (platform-level), resolveHostTeamID, and logSystemHostEvent. Reduces logger.go from 665 to 374 lines with no behavior change.	2026-04-21 15:54:39 +06:00
pptx704	d270ab7752	Version bump	2026-04-21 15:54:04 +06:00
pptx704	7fd801c1eb	feat: add audit logging for all admin actions and admin audit page Log every admin-panel action (user activate/deactivate, team BYOC toggle, team delete, template delete, build create/cancel) to the audit_logs table under PlatformTeamID with scope "admin". Add GET /v1/admin/audit-logs endpoint and /admin/audit frontend page with infinite scroll and hierarchical filters. Expose audit.Entry + Log() for cloud repo extensibility. Fix seed_platform_team down-migration FK violation by deleting dependent rows before the team row.	2026-04-21 15:41:45 +06:00
pptx704	edec170652	fix: remove accent gradient bars from admin host dialogs Normalize admin host page dialogs to match design system pattern: border + shadow only, no colored gradient strips. Align animation timing and shadow to reference components (DestroyDialog, etc).	2026-04-21 15:02:09 +06:00
pptx704	684c98b0fa	fix: admin capsule create audit log uses PlatformTeamID POST /v1/admin/capsules was outside the injectPlatformTeam middleware subrouter, so audit entries landed under the admin's personal team.	2026-04-21 14:54:52 +06:00
pptx704	ebbbde9cd1	feat: anonymize audit logs on user hard-delete and fix host audit log team assignment Anonymize audit logs when soft-deleted users are purged after 15 days: actor_name set to 'deleted-user', actor_id and resource_id nulled, email stripped from member metadata. Per-user delete ensures no user is removed without successful anonymization. Frontend renders deleted-user as a styled red badge in audit log view. Fix shared host create/delete audit logs landing in admin's personal team — now correctly assigned to PlatformTeamID.	2026-04-21 14:42:09 +06:00
pptx704	6a6b489471	feat: separate GitHub OAuth login/signup flows with name confirmation Block auto-account creation when signing in via GitHub from login mode. Signup via GitHub now shows a name confirmation dialog before redirecting to dashboard, letting users verify/edit their display name pulled from GitHub. - Add intent query param to OAuth redirect, persisted in HMAC-signed state cookie - Block registration in callback when intent=login, return no_account error - Set wrenn_oauth_new_signup cookie on new account creation - Frontend callback shows name confirmation dialog for new signups - Add no_account error message to login page	2026-04-21 11:03:12 +06:00
pptx704	dbc6030c17	Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev	2026-04-21 10:09:36 +06:00
Rafeed M. Bhuiyan	23dca7d9ff	v0.1.2 (#33 ) Reviewed-on: wrenn/wrenn#33 v0.1.2	2026-04-18 08:57:07 +00:00
Rafeed M. Bhuiyan	9ee6e3e1a8	Merge pull request 'Feat: Added daily usage page' (#34 ) from feat/usage into dev Reviewed-on: wrenn/wrenn#34	2026-04-18 08:54:04 +00:00
pptx704	aa96557d1c	Clean up dashboard page headers for consistency Remove unnecessary wrapper divs around h1/subtitle pairs in audit, channels, settings, and templates pages. Drop inline count from channels header.	2026-04-18 14:47:33 +06:00
pptx704	47be1143fb	Add MiddlewareProvider interface for extension middleware Allows cloud extensions to inject middleware that wraps OSS routes (e.g. billing enforcement) before they are registered.	2026-04-18 14:47:29 +06:00
pptx704	8f8638e6db	Bump version to 0.1.2	2026-04-18 14:47:25 +06:00
pptx704	003453fa3c	Normalize usage page layout and clarify copy Separate summary cards with proper surface hierarchy, add staggered entrance animations, tighten padding, and rewrite labels/descriptions to be specific and actionable rather than generic.	2026-04-18 14:46:01 +06:00
pptx704	92aab09104	Add daily usage metrics (CPU-minutes, RAM GB-minutes) Introduce pre-computed daily usage rollups from sandbox_metrics_snapshots. An hourly background worker aggregates completed days, while today's usage is computed live from snapshots at query time for freshness. Backend: new daily_usage table, rollup worker, UsageService, and GET /v1/capsules/usage endpoint with date range filtering (up to 92 days). Frontend: replace Usage page placeholder with bar charts (Chart.js), summary total cards, and preset/custom date range controls.	2026-04-18 14:29:09 +06:00
pptx704	e7670e4449	Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev	2026-04-17 16:41:08 +06:00
pptx704	512c043c5c	Version bump v0.1.1	2026-04-17 16:40:29 +06:00
pptx704	5fa3529df9	Move email types to pkg/email for cloud repo access Extracts Mailer interface, EmailData, and Button to pkg/email/types.go so the cloud repo can use them via ServerContext. internal/email re-exports the types as aliases so existing callers are unchanged. Also fixes pre-existing lint errors (unchecked rollback and deadline calls).	2026-04-17 16:36:54 +06:00
pptx704	955aa09780	Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev	2026-04-17 01:24:52 +06:00
Rafeed M. Bhuiyan	605ad666a0	v0.1.0 (#17 )	2026-04-16 19:24:25 +00:00
Rafeed M. Bhuiyan	ce452c3d11	Merge pull request 'Improved codebase to prepare for production' (#32 ) from chore/hardening into dev Reviewed-on: wrenn/wrenn#32	2026-04-16 13:00:06 +00:00
Rafeed M. Bhuiyan	ab034062d3	Merge branch 'dev' into chore/hardening	2026-04-16 12:58:48 +00:00

1 2 3 4 5 ...

268 Commits