1
0
forked from wrenn/wrenn
Commit Graph

9 Commits

Author SHA1 Message Date
aca43d51eb fix: resolve process stream hangs, pause race, and PTY signal loss
- Cache terminal EndEvent on ProcessHandle so connect() can detect
  already-exited processes instead of hanging forever on broadcast
  receivers that missed the event. Subscribe before checking cache
  to close the TOCTOU window.

- Protect sb.Status writes in Pause with m.mu to prevent data race
  with concurrent readers (AcquireProxyConn, Exec, etc.).

- Restart metrics sampler in restoreRunning so a failed pause attempt
  doesn't permanently kill sandbox metrics collection.

- Return dequeued non-input messages from coalescePtyInput instead of
  dropping them, preventing silent loss of kill/resize signals during
  typing bursts.
2026-05-09 18:11:15 +06:00
522e1c5e90 fix: subscribe to process channels before spawning threads to prevent event loss
Fast-exiting processes (e.g. echo) sent data/end events before
start() subscribed to the broadcast channels, causing the stream
to hang indefinitely and the exec RPC to time out with 502.

Move channel subscription into spawn_process, before reader/waiter
threads start, and return pre-subscribed receivers via SpawnedProcess.
2026-05-09 17:28:37 +06:00
d1d316f35c fix: resolve exec 502 by terminating process streams on exit
The start() and connect() streaming RPCs blocked forever in the data
event loop because ProcessHandle retains a broadcast sender (needed for
reconnection via connect()), preventing the channel from closing.

Race data_rx against end_rx with tokio::select! so the stream terminates
when the process exits. Remaining buffered data is drained before
yielding the end event.
2026-05-09 16:36:33 +06:00
2af8412cdc fix: use RwLock for envd Defaults to fix silent mutation loss
The /init handler's default_user mutation cloned the Defaults struct,
mutated the clone, then dropped it — the actual state was never updated.
This caused processes to always run as "root" regardless of the user
set via POST /init. Additionally, default_workdir was accepted in the
init request but never applied.

Wrap user and workdir fields in RwLock with accessor methods so mutations
propagate correctly through the shared AppState.
2026-05-09 15:28:09 +06:00
01819642cc fix: drop page cache before snapshot to reduce memory dump size
Linux keeps freed memory as page cache, which Firecracker snapshots
as non-zero blocks. A 16GB VM with 12GB stale cache would write all
12GB to disk. Dropping pagecache (not dentries/inodes) in
/snapshot/prepare before blocking the reclaimer shrinks snapshots
to actual working set size with minimal resume latency impact.
2026-05-03 14:27:49 +06:00
1178ab8b21 fix: accurate sandbox metrics and memory management
Three issues fixed:

1. Memory metrics read host-side VmRSS of the Firecracker process,
   which includes guest page cache and never decreases. Replaced
   readMemRSS(fcPID) with readEnvdMemUsed(client) that queries
   envd's /metrics endpoint for guest-side total - MemAvailable.
   This matches neofetch and reflects actual process memory.

2. Added Firecracker balloon device (deflate_on_oom, 5s stats) and
   envd-side periodic page cache reclaimer (drop_caches when >80%
   used). Reclaimer is gated by snapshot_in_progress flag with
   sync() before freeze to prevent memory corruption during pause.

3. Sampling interval 500ms → 1s, ring buffer capacities adjusted
   to maintain same time windows. Reduces per-host HTTP load from
   240 calls/sec to 120 calls/sec at 120 capsules.

Also: maxDiffGenerations 8 → 1 (merge every re-pause since UFFD
lazy-loads anyway), envd mem_used formula uses total - available.
2026-05-03 12:19:01 +06:00
31456fd169 fix: resolve PTY failure, MMDS file writes, and metrics instability in envd-rs
Three bugs fixed:

1. PTY connections failed because home directory was hardcoded as
   /home/{username} instead of reading from /etc/passwd. For root,
   this produced /home/root/ which doesn't exist — CWD validation
   rejected every PTY Start request without explicit cwd. Fixed all
   6 locations to use user.dir from nix::unistd::User.

2. MMDS polling silently failed to parse metadata because the
   logs_collector_address field lacked #[serde(default)]. The host
   agent only sends instanceID + envID — missing "address" field
   caused every deserialize attempt to fail, so .WRENN_SANDBOX_ID
   and .WRENN_TEMPLATE_ID were never written. Also added error
   logging and create_dir_all before file writes.

3. Metrics CPU values were non-deterministic because a fresh
   sysinfo::System was created per request with a 100ms sleep
   between reads. Replaced with a background thread that samples
   CPU at fixed 1-second intervals via a persistent System instance,
   matching gopsutil's internal caching behavior. Metrics endpoint
   now reads cached atomic values — no blocking, consistent window.

Also: close master PTY fd in child pre_exec, add process.Start
request logging, bump version to 0.2.0.
2026-05-03 04:28:10 +06:00
1143acd37a refactor: remove Go envd module, update host agent for Rust envd
The Go envd guest agent (`envd/`) is fully replaced by the Rust
implementation (`envd-rs/`). This commit removes the Go module and
updates all references across the codebase.

Makefile: remove ENVD_DIR, VERSION_ENVD, build-envd-go, dev-envd-go,
and Go envd from proto/fmt/vet/tidy/clean targets. Add static-link
verification to build-envd.

Host agent: rewrite snapshot quiesce comments that referenced Go GC
and page allocator corruption — no longer applicable with Rust envd.
Tighten envdclient to expect HTTP 200 (not 204) from health and file
upload endpoints, and require JSON version response from FetchVersion.

Remove NOTICE (no e2b-derived code remains). Update CLAUDE.md and
README.md to reflect Rust envd architecture.
2026-05-03 03:12:25 +06:00
0b53d34417 feat: rewrite envd guest agent in Rust (envd-rs)
Complete Rust rewrite of the Go envd guest daemon that runs as PID 1
inside Firecracker microVMs. Feature-complete across all 8 phases:

- Health, metrics, and env var endpoints
- Crypto (SHA-256/512, HMAC), auth (secure token, signing), init/snapshot
- Connect RPC via connectrpc + buffa (process + filesystem services)
- File transfer (GET/POST /files) with gzip, multipart, chown, ENOSPC
- Port subsystem (/proc/net/tcp scanner, socat forwarder)
- Cgroup2 manager with noop fallback
- Snapshot/restore lifecycle (conntracker, port subsystem stop/restart)
- SIGTERM graceful shutdown, --cmd initial process spawn
- MMDS metadata polling for Firecracker mode

42 source files, ~4200 LOC, 4.1MB stripped release binary.
Makefile updated: build-envd now targets Rust (musl static),
build-envd-go preserved for Go builds.
2026-05-03 02:47:15 +06:00