1
0
forked from wrenn/wrenn

291 Commits

Author SHA1 Message Date
62bede5dae fix: resolve bugs and DRY violations in sandbox manager and API handlers
- Fix createFromSnapshot discarding memoryMB param (balloon optimization was dead)
- Fix double dm-snapshot removal in Pause() cleanupPauseFailure path
- Fix DestroySandbox RPC mapping all errors to CodeNotFound
- Fix handleFailed event consumer missing pausing/resuming → error transitions
- Fix stream resource leak in StreamUpload on early-return paths
- Add envs/cwd fields to ExecRequest proto for foreground exec parity
- Extract createResources rollback helper to eliminate 4x duplicated teardown
- Remove unused chClient.ping method
- Add .mcp.json to gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-17 02:30:32 +06:00
74f85ce4e9 refactor: polish control plane and host agent code
- Decompose executeBuild (318 lines) into provisionBuildSandbox and
  finalizeBuild helpers for readability
- Extract cleanupPauseFailure in sandbox manager to unify 3 inconsistent
  inline teardown paths (also fixes CoW file leak on rename failure)
- Remove unused ctx parameter from startProcess/startProcessForRestore
- Add missing MASQUERADE rollback entry in CreateNetwork for symmetry
- Consolidate duplicate writeJSON for UTF-8/base64 exec response
2026-05-17 02:11:48 +06:00
124e097e23 refactor: eliminate DRY violations across control plane and host agent
Extract shared helpers to consolidate repeated patterns:
- requireRunningSandbox: sandbox lookup + running check (10 call sites)
- upgradeAndAuthenticate: WS upgrade + JWT/API-key auth (3 handlers)
- updateLastActive: last_active_at update with background context (5 sites)
- attachCowAndCreate: cow loop attach + dmsetup create (devicemapper)
- issueRegistrationToken: token gen + Redis + audit (host service)
- ErrNotFound sentinel: replaces string matching in hostagent server

Also merges duplicate wsProcessOut/wsOutMsg types into one.

Net: -208 lines, zero behavior change.
2026-05-17 02:03:06 +06:00
a5425969ed fix: assorted bug fixes for CH migration
Fix resource leaks, race conditions, and error handling across host
agent and control plane: proper sparse file cleanup on close error,
connect error wrapping for MakeDir, CoW file cleanup on pause failure,
per-sandbox VM directories, deferred map deletion to avoid race in VM
destroy, and goroutine launch for extension background workers.
2026-05-17 01:47:56 +06:00
fb16bc9ed1 chore: update proto, scripts, and docs for CH migration
- Update hostagent proto: firecracker_version → vmm_version in metadata
- Regenerate hostagent.pb.go
- Update .env.example: WRENN_FIRECRACKER_BIN → WRENN_CH_BIN
- Update Makefile: remove --isnotfc from dev-envd target
- Update prepare-wrenn-user.sh: firecracker → cloud-hypervisor paths
  and capability assignments
- Update wrenn-init.sh: disable write_zeroes on rootfs for dm-snapshot
  compatibility with CH
- Update README.md and CLAUDE.md: Firecracker → Cloud Hypervisor
  throughout
2026-05-17 01:33:35 +06:00
dd8a940431 feat(envd): update guest agent for Cloud Hypervisor
Remove Firecracker-specific MMDS metadata fetching and metrics host
module. CH communicates with the guest purely over TAP networking,
so MMDS (Firecracker's metadata service via MMDS address) is no longer
needed.

- Remove src/host/ module (mmds.rs, metrics.rs)
- Remove reqwest dependency (was only used for MMDS HTTP calls)
- Remove --isnotfc CLI flag (no longer dual-mode)
- Simplify health endpoint and init handler
- Update state management for CH snapshot lifecycle
- Bump version to 0.3.0
2026-05-17 01:33:25 +06:00
eaa6b8576d feat(vm): replace Firecracker with Cloud Hypervisor
Migrate the entire VM layer from Firecracker to Cloud Hypervisor (CH).
CH provides native snapshot/restore via its HTTP API, eliminating the
need for custom UFFD handling, memfile processing, and snapshot header
management that Firecracker required.

Key changes:
- Remove fc.go, jailer.go (FC process management)
- Remove internal/uffd/ package (userfaultfd lazy page loading)
- Remove snapshot/header.go, mapping.go, memfile.go (FC snapshot format)
- Add ch.go (CH HTTP API client over Unix socket)
- Add process.go (CH process lifecycle with unshare+netns)
- Add chversion.go (CH version detection)
- Refactor sandbox manager: remove UFFD socket tracking, snapshot
  parent/diff chaining, FC-specific balloon logic; add crash watcher
- Simplify snapshot/local.go to CH's native snapshot format
- Update VM config: FirecrackerBin → VMMBin, new CH-specific fields
- Update envdclient, devicemapper, network for CH compatibility
2026-05-17 01:33:12 +06:00
c2dc382787 Updated openapi schema 2026-05-16 18:32:37 +06:00
3671af2498 feat: immediate sandbox reconciliation on host reconnect
When a host transitions from unreachable → online via heartbeat, trigger
ReconcileHost in a background goroutine so "missing" sandboxes are
resolved instantly instead of waiting up to 60s for the next monitor tick.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-16 16:15:49 +06:00
e34bcedc31 Merge pull request 'fix/remove-sync-updates' (#47) from fix/remove-sync-updates into dev
Reviewed-on: wrenn/wrenn#47
2026-05-15 08:08:07 +00:00
ff91ef3edf Bump versions 2026-05-15 13:56:04 +06:00
ba3a3db98c Updated openapi specs 2026-05-15 12:39:06 +06:00
6faad45a28 feat: async sandbox lifecycle with Redis Stream events
Replace synchronous RPC-based CP-host communication for sandbox
lifecycle operations (Create, Pause, Resume, Destroy) with an async
pattern. CP handlers now return 202 Accepted immediately, fire agent
RPCs in background goroutines, and publish state events to a Redis
Stream. A background consumer processes events as a fallback writer.

Agent-side auto-pause events are pushed to the CP via HTTP callback
(POST /v1/hosts/sandbox-events), keeping Redis internal to the CP.

All DB status transitions use conditional updates
(UpdateSandboxStatusIf, UpdateSandboxRunningIf) to prevent race
conditions between concurrent operations and background goroutines.

The HostMonitor reconciler is kept at 60s as a safety net, extended
to handle transient statuses (starting, pausing, resuming, stopping).

Frontend updated to handle 202 responses with empty bodies and render
transient statuses with blue indicators.
2026-05-15 12:25:16 +06:00
c08884fa2c Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-05-13 11:05:49 +06:00
6164d7cae3 version bump 2026-05-13 10:58:54 +06:00
dc6776cc8f fix(agent): register with CP before inflating rootfs images 2026-05-13 10:52:22 +06:00
0bfda08f47 Merge pull request 'test (envd): add 136 unit tests across 12 modules' (#44) from testing/envd into dev
Reviewed-on: wrenn/wrenn#44
2026-05-13 04:42:06 +00:00
485be22a16 test(envd): add 136 unit tests across 12 modules
Cover all pure-function modules with inline #[cfg(test)] blocks:
crypto (NIST/RFC 4231 known-answer vectors), auth (SecureToken ops,
signature generation/validation), conntracker (snapshot lifecycle),
execcontext, util (AtomicMax concurrent correctness), http/encoding
(RFC 7231 negotiation), port/conn (/proc/net/tcp parsing),
rpc/entry (format_permissions), and permissions/path (tilde expansion,
ensure_dirs). Add tempfile dev-dep for filesystem tests. Update
Makefile test target to include cargo test.
2026-05-13 10:39:54 +06:00
ead406bdac Merge pull request 'fix: resolve large operation reliability — stream hangs, pause races, and memory bloat' (#43) from fix/large-operations into dev
Reviewed-on: wrenn/wrenn#43
2026-05-13 03:44:41 +00:00
1472d77b52 Merge branch 'dev' into fix/large-operations 2026-05-13 03:44:19 +00:00
6a0fea30a6 Rootfs script updated 2026-05-13 09:35:06 +06:00
8c34388fc2 Changed commands to check if envd is statically linked or not 2026-05-12 23:19:30 +06:00
aca43d51eb fix: resolve process stream hangs, pause race, and PTY signal loss
- Cache terminal EndEvent on ProcessHandle so connect() can detect
  already-exited processes instead of hanging forever on broadcast
  receivers that missed the event. Subscribe before checking cache
  to close the TOCTOU window.

- Protect sb.Status writes in Pause with m.mu to prevent data race
  with concurrent readers (AcquireProxyConn, Exec, etc.).

- Restart metrics sampler in restoreRunning so a failed pause attempt
  doesn't permanently kill sandbox metrics collection.

- Return dequeued non-input messages from coalescePtyInput instead of
  dropping them, preventing silent loss of kill/resize signals during
  typing bursts.
2026-05-09 18:11:15 +06:00
522e1c5e90 fix: subscribe to process channels before spawning threads to prevent event loss
Fast-exiting processes (e.g. echo) sent data/end events before
start() subscribed to the broadcast channels, causing the stream
to hang indefinitely and the exec RPC to time out with 502.

Move channel subscription into spawn_process, before reader/waiter
threads start, and return pre-subscribed receivers via SpawnedProcess.
2026-05-09 17:28:37 +06:00
d1d316f35c fix: resolve exec 502 by terminating process streams on exit
The start() and connect() streaming RPCs blocked forever in the data
event loop because ProcessHandle retains a broadcast sender (needed for
reconnection via connect()), preventing the channel from closing.

Race data_rx against end_rx with tokio::select! so the stream terminates
when the process exits. Remaining buffered data is drained before
yielding the end event.
2026-05-09 16:36:33 +06:00
2af8412cdc fix: use RwLock for envd Defaults to fix silent mutation loss
The /init handler's default_user mutation cloned the Defaults struct,
mutated the clone, then dropped it — the actual state was never updated.
This caused processes to always run as "root" regardless of the user
set via POST /init. Additionally, default_workdir was accepted in the
init request but never applied.

Wrap user and workdir fields in RwLock with accessor methods so mutations
propagate correctly through the shared AppState.
2026-05-09 15:28:09 +06:00
c93ad5e2db fix: harden pause flow with connection isolation and UFFD event handling
Restructure pause to: block new operations (StatusPausing), drain proxy
connections with 5s grace, force-close remaining via context cancellation,
drop page cache, inflate balloon, then freeze vCPUs. Previously connections
could arrive during the pause window and API operations weren't blocked.

Handle UFFD_EVENT_REMOVE/UNMAP/REMAP/FORK gracefully instead of crashing
the UFFD server. These events fire during balloon deflation on snapshot
restore, killing the page fault handler and preventing VM boot.

Also adds ConnTracker.ForceClose() with cancellable context propagated
through the proxy handler, so lingering proxy connections are actively
terminated rather than left dangling.
2026-05-09 14:51:19 +06:00
38799770db fix: inflate balloon before snapshot to reduce memfile size
Firecracker dumps the entire VM memory region regardless of guest
usage. A 20GB VM using 500MB still produces a ~20GB memfile because
freed pages retain stale data (non-zero blocks).

Inflate the balloon device before snapshot to reclaim free guest
memory. Balloon pages become zero from FC's perspective, allowing
ProcessMemfile to skip them. This reduces memfile size from ~20GB
to ~1-2GB for lightly-used VMs.

- Pause: read guest memory usage, inflate balloon to reclaim free
  pages, wait 2s for guest kernel to process, then proceed
- Resume: deflate balloon to 0 after PostInit so guest gets full
  memory back
- createFromSnapshot: same deflation since template snapshots
  inherit inflated balloon state
- All balloon ops are best-effort with debug logging on failure
2026-05-05 15:38:04 +06:00
51b5d7b3ba fix: resolve pause/snapshot failures and CoW exhaustion on large VMs
Remove hard 10s timeout from Firecracker HTTP client — callers already
pass context.Context with appropriate deadlines, and 20GB+ memfile
writes easily exceed 10s.

Ensure CoW file is at least as large as the origin rootfs. Previously,
WRENN_DEFAULT_ROOTFS_SIZE=30Gi expanded the base image to 30GB but the
default 5GB CoW could not hold all writes, causing dm-snapshot
invalidation and EIO on all guest I/O.

Destroy frozen VMs in resumeOnError instead of leaving zombies that
report "running" but can't execute. Use fresh context for the resume
attempt so a cancelled caller context doesn't falsely trigger destroy.

Increase CP→Agent ResponseHeaderTimeout from 45s to 5min and
PrepareSnapshot timeout from 3s to 30s for large-memory VMs.

After failed pause, ping agent to detect destroyed sandboxes and mark
DB status as "error" instead of reverting to "running".
2026-05-04 01:46:57 +06:00
fd5fa28205 Merge pull request 'Enhanced frontend ux' (#42) from enhance/frontend into dev
Reviewed-on: wrenn/wrenn#42
2026-05-03 11:08:48 +00:00
1244c08e42 fix: fetch sandbox metrics immediately on page load
Metrics data was only fetched after Chart.js dynamic import completed,
leaving graphs empty until the first poll interval fired. Now
loadMetrics() runs in parallel with the Chart.js import, and
initCharts() resets the dedup key so pre-fetched data populates
newly created chart instances.
2026-05-03 16:43:26 +06:00
021d709de2 feat: show template owner and restrict delete in admin panel
Add Owner column to admin templates table, resolving team IDs to names
via admin teams API. Disable delete for non-platform templates and the
minimal template, with contextual tooltips explaining why.
2026-05-03 15:51:20 +06:00
cac6fcd626 feat: admin grant/revoke from admin panel
Add PUT /v1/admin/users/{id}/admin endpoint and frontend UI for
granting and revoking platform admin status. Uses atomic conditional
SQL (RevokeUserAdmin) to prevent race conditions that could remove
the last admin. Includes idempotency check, audit logging, and
confirmation dialog with self-demotion warning.
2026-05-03 15:24:34 +06:00
4954b19d7c fix: merge capsule data in-place to prevent visual refresh on poll
Replaces full array assignment with granular merge that reuses existing
Svelte proxy objects, so only rows with actual data changes re-render.
2026-05-03 15:09:21 +06:00
01819642cc fix: drop page cache before snapshot to reduce memory dump size
Linux keeps freed memory as page cache, which Firecracker snapshots
as non-zero blocks. A 16GB VM with 12GB stale cache would write all
12GB to disk. Dropping pagecache (not dentries/inodes) in
/snapshot/prepare before blocking the reclaimer shrinks snapshots
to actual working set size with minimal resume latency impact.
2026-05-03 14:27:49 +06:00
cb28f7759d Merge pull request 'fix: accurate sandbox metrics and memory management' (#41) from bugfix/sandbox-metrics-calculations into dev
Reviewed-on: wrenn/wrenn#41
2026-05-03 06:41:41 +00:00
1178ab8b21 fix: accurate sandbox metrics and memory management
Three issues fixed:

1. Memory metrics read host-side VmRSS of the Firecracker process,
   which includes guest page cache and never decreases. Replaced
   readMemRSS(fcPID) with readEnvdMemUsed(client) that queries
   envd's /metrics endpoint for guest-side total - MemAvailable.
   This matches neofetch and reflects actual process memory.

2. Added Firecracker balloon device (deflate_on_oom, 5s stats) and
   envd-side periodic page cache reclaimer (drop_caches when >80%
   used). Reclaimer is gated by snapshot_in_progress flag with
   sync() before freeze to prevent memory corruption during pause.

3. Sampling interval 500ms → 1s, ring buffer capacities adjusted
   to maintain same time windows. Reduces per-host HTTP load from
   240 calls/sec to 120 calls/sec at 120 capsules.

Also: maxDiffGenerations 8 → 1 (merge every re-pause since UFFD
lazy-loads anyway), envd mem_used formula uses total - available.
2026-05-03 12:19:01 +06:00
233e747d5d Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-05-03 04:56:14 +06:00
20a228eb8d Merge pull request 'Rewritten envd with rust to improve reliability during pause and resume operations' (#39) from feat/envd-rewrite into dev
Reviewed-on: wrenn/wrenn#39
2026-05-02 22:49:36 +00:00
ef5f223863 fix: improve error feedback for terminal disconnects and host unavailability
Show "[session disconnected]" in terminal when PTY websocket closes cleanly.
Map scheduler and agent unavailability errors to 503 with user-friendly
message instead of leaking internal details.
2026-05-03 04:47:10 +06:00
31456fd169 fix: resolve PTY failure, MMDS file writes, and metrics instability in envd-rs
Three bugs fixed:

1. PTY connections failed because home directory was hardcoded as
   /home/{username} instead of reading from /etc/passwd. For root,
   this produced /home/root/ which doesn't exist — CWD validation
   rejected every PTY Start request without explicit cwd. Fixed all
   6 locations to use user.dir from nix::unistd::User.

2. MMDS polling silently failed to parse metadata because the
   logs_collector_address field lacked #[serde(default)]. The host
   agent only sends instanceID + envID — missing "address" field
   caused every deserialize attempt to fail, so .WRENN_SANDBOX_ID
   and .WRENN_TEMPLATE_ID were never written. Also added error
   logging and create_dir_all before file writes.

3. Metrics CPU values were non-deterministic because a fresh
   sysinfo::System was created per request with a 100ms sleep
   between reads. Replaced with a background thread that samples
   CPU at fixed 1-second intervals via a persistent System instance,
   matching gopsutil's internal caching behavior. Metrics endpoint
   now reads cached atomic values — no blocking, consistent window.

Also: close master PTY fd in child pre_exec, add process.Start
request logging, bump version to 0.2.0.
2026-05-03 04:28:10 +06:00
bbcde17d49 Updated static link check for envd 2026-05-03 03:32:41 +06:00
f328113a2a rename guest hostname from "sandbox" to "capsule"
Terminal prompt inside VMs now shows root@capsule instead of
root@sandbox, aligning with user-facing "capsule" terminology.
2026-05-03 03:32:03 +06:00
1143acd37a refactor: remove Go envd module, update host agent for Rust envd
The Go envd guest agent (`envd/`) is fully replaced by the Rust
implementation (`envd-rs/`). This commit removes the Go module and
updates all references across the codebase.

Makefile: remove ENVD_DIR, VERSION_ENVD, build-envd-go, dev-envd-go,
and Go envd from proto/fmt/vet/tidy/clean targets. Add static-link
verification to build-envd.

Host agent: rewrite snapshot quiesce comments that referenced Go GC
and page allocator corruption — no longer applicable with Rust envd.
Tighten envdclient to expect HTTP 200 (not 204) from health and file
upload endpoints, and require JSON version response from FetchVersion.

Remove NOTICE (no e2b-derived code remains). Update CLAUDE.md and
README.md to reflect Rust envd architecture.
2026-05-03 03:12:25 +06:00
0b53d34417 feat: rewrite envd guest agent in Rust (envd-rs)
Complete Rust rewrite of the Go envd guest daemon that runs as PID 1
inside Firecracker microVMs. Feature-complete across all 8 phases:

- Health, metrics, and env var endpoints
- Crypto (SHA-256/512, HMAC), auth (secure token, signing), init/snapshot
- Connect RPC via connectrpc + buffa (process + filesystem services)
- File transfer (GET/POST /files) with gzip, multipart, chown, ENOSPC
- Port subsystem (/proc/net/tcp scanner, socat forwarder)
- Cgroup2 manager with noop fallback
- Snapshot/restore lifecycle (conntracker, port subsystem stop/restart)
- SIGTERM graceful shutdown, --cmd initial process spawn
- MMDS metadata polling for Firecracker mode

42 source files, ~4200 LOC, 4.1MB stripped release binary.
Makefile updated: build-envd now targets Rust (musl static),
build-envd-go preserved for Go builds.
2026-05-03 02:47:15 +06:00
3deecbff89 fix: prevent Go runtime memory corruption and sandbox halt after snapshot restore
Three root causes addressed:

1. Go page allocator corruption: allocations between the pre-snapshot GC
   and VM freeze leave the summary tree inconsistent. After restore, GC
   reads corrupted metadata — either panicking (killing PID 1 → kernel
   panic) or silently failing to collect, causing unbounded heap growth
   until OOM. Fix: move GC to after all HTTP allocations in
   PostSnapshotPrepare, then set GOMAXPROCS(1) so any remaining
   allocations run sequentially with no concurrent page allocator access.
   GOMAXPROCS is restored on first health check after restore.

2. PostInit timeout starvation: WaitUntilReady and PostInit shared a
   single 30s context. If WaitUntilReady consumed most of it, PostInit
   failed — RestoreAfterSnapshot never ran, leaving envd with keep-alives
   disabled and zombie connections. Fix: separate timeout contexts.

3. CP HTTP server missing timeouts: no ReadHeaderTimeout or IdleTimeout
   caused goroutine leaks from hung proxy connections. Fix: add both,
   matching host agent values.

Also adds UFFD prefetch to proactively load all guest pages after restore,
eliminating on-demand page fault latency for subsequent RPC calls.
2026-05-02 17:22:51 +06:00
bb582deefa fix: prevent sandbox halt after resume by fixing HTTP/2 HOL blocking and adding timeouts
Disable HTTP/2 on both host agent server and CP→agent transport — multiplexing
caused head-of-line blocking when a slow sandbox RPC stalled the shared connection.
Add ResponseHeaderTimeout to envd HTTP clients. Merge SetDefaults into Resume's
PostInit call to eliminate an extra round-trip that could hang on a stale connection.
2026-05-02 13:48:51 +06:00
7ef9a64613 fix: close stale TCP connections across snapshot/restore to prevent envd hangs
After Firecracker snapshot restore, zombie TCP sockets from the previous
session cause Go runtime corruption inside the guest VM, making envd
unresponsive. This manifests as infinite loading in the file browser and
terminal timeouts (524) in production (HTTP/2 + Cloudflare) but not locally.

Four-part fix:
- Add ServerConnTracker to envd that tracks connections via ConnState callback,
  closes idle connections and disables keep-alives before snapshot, then closes
  all pre-snapshot zombie connections on restore (while preserving post-restore
  connections like the /init request)
- Split envdclient into timeout (2min) and streaming (no timeout) HTTP clients;
  use streaming client for file transfers and process RPCs
- Close host-side idle envdclient connections before PrepareSnapshot so FIN
  packets propagate during the 3s quiesce window
- Add StreamingHTTPClient() accessor; streaming file transfer handlers in
  hostagent use it instead of the timeout client
2026-05-02 05:19:37 +06:00
f3572f7356 Fix empty WRENN_TEMPLATE_ID after resuming paused sandbox
Resume() was building VMConfig without TemplateID, so Firecracker MMDS
received an empty string. envd's PostInit then wrote that empty value to
/run/wrenn/.WRENN_TEMPLATE_ID. Fix by persisting the template ID in
snapshot metadata during Pause and reading it back during Resume.
2026-05-02 04:57:08 +06:00
2e998a26a2 Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-05-01 15:01:32 +06:00
f3ec626d58 Envd version bump 2026-05-01 14:59:37 +06:00
f4733e2f7a Version bump 2026-04-25 04:49:17 +06:00
cdacc12a48 Merge pull request 'Fixed network throttle when an application is running' (#37) from fix/network-throttle-on-load into dev
Reviewed-on: wrenn/wrenn#37
2026-04-24 22:43:31 +00:00
bd98610153 fix: sandbox network responsiveness under port-binding apps
Running port-binding applications (Jupyter, http.server, NextJS) inside
sandboxes caused severe PTY sluggishness and proxy navigation errors.

Root cause: the CP sandbox proxy and Connect RPC pool shared a single
HTTP transport. Heavy proxy traffic (Jupyter WebSocket, REST polling)
interfered with PTY RPC streams via HTTP/2 flow control contention.

Transport isolation (main fix):
- Add dedicated proxy transport on CP (NewProxyTransport) with HTTP/2
  disabled, separate from the RPC pool transport
- Add dedicated proxy transport on host agent, replacing
  http.DefaultTransport
- Add dedicated envdclient transport with tuned connection pooling
- Replace http.DefaultClient in file streaming RPCs with per-sandbox
  envd client

Proxy path rewriting (navigation fix):
- Add ModifyResponse to rewrite Location headers with /proxy/{id}/{port}
  prefix, handling both root-relative and absolute-URL redirects
- Strip prefix back out in CP subdomain proxy for correct browser
  behavior
- Replace path.Join with string concat in CP Director to preserve
  trailing slashes (prevents redirect loops on directory listings)

Proxy resilience:
- Add dial retry with linear backoff (3 attempts) to handle socat
  startup delay when ports are first detected
- Cache ReverseProxy instances per sandbox+port+host in sync.Map
- Add EvictProxy callback wired into sandbox Manager.Destroy

Buffer and server hardening:
- Increase PTY and exec stream channel buffers from 16 to 256
- Add ReadHeaderTimeout (10s) and IdleTimeout (620s) to host agent
  HTTP server

Network tuning:
- Set TAP device TxQueueLen to 5000 (up from default 1000)
- Add Firecracker tx_rate_limiter (200 MB/s sustained, 100 MB burst)
  to prevent guest traffic from saturating the TAP
2026-04-25 04:21:55 +06:00
5e13879954 fix: OAuth ConnectProvider state HMAC format mismatch
ConnectProvider computed HMAC over bare state, but Callback always
verifies HMAC(state+":"+intent). This caused the account-linking
flow to always fail with invalid_state.
2026-04-25 02:00:39 +06:00
339cd7bee1 fix: security and stability fixes from code review
- Scope WebSocket auth bypass to only WS endpoints by restructuring
  routes into separate chi Groups. Non-WS routes no longer passthrough
  unauthenticated requests with spoofed Upgrade headers. Added
  optionalAPIKeyOrJWT middleware for WS routes (injects auth context
  from API key/JWT if present, passes through otherwise) and
  markAdminWS middleware for admin WS routes.

- Fix nil pointer dereference in envd Handler.Wait() — p.tty.Close()
  was called unconditionally but p.tty is nil for non-PTY processes,
  crashing every non-PTY process exit.

- Fix goroutine leak in sandbox Pause — stopSampler was never called,
  leaking one sampler goroutine per successful pause operation.

- Decouple PTY WebSocket reads from RPC dispatch using a buffered
  channel to prevent backpressure-induced connection drops under fast
  typing. Includes input coalescing to reduce RPC call volume.
2026-04-24 15:48:38 +06:00
153a54fdcd Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-04-21 16:11:59 +06:00
c3afd0c8a0 Merge pull request 'Audit logging, Data anonymization, and OAuth flow improvements' (#35) from feat/compliance into dev
Reviewed-on: wrenn/wrenn#35
2026-04-21 10:09:37 +00:00
11928a172a feat: send email notification on account hard-delete
Notify users via email when their account is permanently deleted after
the 15-day soft-delete grace period. Query now returns email alongside
user ID so the notification can be sent after deletion.

Email failure is logged as a warning but does not block cleanup.
2026-04-21 16:01:56 +06:00
bb2146d838 refactor: deduplicate audit logger with shared entry builders
Replace repetitive actorFields + write boilerplate across all 25+ typed
Log methods with shared helpers: newEntry (general), newAdminEntry
(platform-level), resolveHostTeamID, and logSystemHostEvent.

Reduces logger.go from 665 to 374 lines with no behavior change.
2026-04-21 15:54:39 +06:00
d270ab7752 Version bump 2026-04-21 15:54:04 +06:00
7fd801c1eb feat: add audit logging for all admin actions and admin audit page
Log every admin-panel action (user activate/deactivate, team BYOC toggle,
team delete, template delete, build create/cancel) to the audit_logs table
under PlatformTeamID with scope "admin".

Add GET /v1/admin/audit-logs endpoint and /admin/audit frontend page with
infinite scroll and hierarchical filters. Expose audit.Entry + Log() for
cloud repo extensibility.

Fix seed_platform_team down-migration FK violation by deleting dependent
rows before the team row.
2026-04-21 15:41:45 +06:00
edec170652 fix: remove accent gradient bars from admin host dialogs
Normalize admin host page dialogs to match design system pattern:
border + shadow only, no colored gradient strips. Align animation
timing and shadow to reference components (DestroyDialog, etc).
2026-04-21 15:02:09 +06:00
684c98b0fa fix: admin capsule create audit log uses PlatformTeamID
POST /v1/admin/capsules was outside the injectPlatformTeam middleware
subrouter, so audit entries landed under the admin's personal team.
2026-04-21 14:54:52 +06:00
ebbbde9cd1 feat: anonymize audit logs on user hard-delete and fix host audit log team assignment
Anonymize audit logs when soft-deleted users are purged after 15 days:
actor_name set to 'deleted-user', actor_id and resource_id nulled,
email stripped from member metadata. Per-user delete ensures no user
is removed without successful anonymization.

Frontend renders deleted-user as a styled red badge in audit log view.

Fix shared host create/delete audit logs landing in admin's personal
team — now correctly assigned to PlatformTeamID.
2026-04-21 14:42:09 +06:00
6a6b489471 feat: separate GitHub OAuth login/signup flows with name confirmation
Block auto-account creation when signing in via GitHub from login mode.
Signup via GitHub now shows a name confirmation dialog before redirecting
to dashboard, letting users verify/edit their display name pulled from
GitHub.

- Add intent query param to OAuth redirect, persisted in HMAC-signed state cookie
- Block registration in callback when intent=login, return no_account error
- Set wrenn_oauth_new_signup cookie on new account creation
- Frontend callback shows name confirmation dialog for new signups
- Add no_account error message to login page
2026-04-21 11:03:12 +06:00
dbc6030c17 Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-04-21 10:09:36 +06:00
9ee6e3e1a8 Merge pull request 'Feat: Added daily usage page' (#34) from feat/usage into dev
Reviewed-on: wrenn/wrenn#34
2026-04-18 08:54:04 +00:00
aa96557d1c Clean up dashboard page headers for consistency
Remove unnecessary wrapper divs around h1/subtitle pairs in audit,
channels, settings, and templates pages. Drop inline count from
channels header.
2026-04-18 14:47:33 +06:00
47be1143fb Add MiddlewareProvider interface for extension middleware
Allows cloud extensions to inject middleware that wraps OSS routes
(e.g. billing enforcement) before they are registered.
2026-04-18 14:47:29 +06:00
8f8638e6db Bump version to 0.1.2 2026-04-18 14:47:25 +06:00
003453fa3c Normalize usage page layout and clarify copy
Separate summary cards with proper surface hierarchy, add staggered
entrance animations, tighten padding, and rewrite labels/descriptions
to be specific and actionable rather than generic.
2026-04-18 14:46:01 +06:00
92aab09104 Add daily usage metrics (CPU-minutes, RAM GB-minutes)
Introduce pre-computed daily usage rollups from sandbox_metrics_snapshots.
An hourly background worker aggregates completed days, while today's
usage is computed live from snapshots at query time for freshness.

Backend: new daily_usage table, rollup worker, UsageService, and
GET /v1/capsules/usage endpoint with date range filtering (up to 92 days).

Frontend: replace Usage page placeholder with bar charts (Chart.js),
summary total cards, and preset/custom date range controls.
2026-04-18 14:29:09 +06:00
e7670e4449 Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-04-17 16:41:08 +06:00
955aa09780 Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-04-17 01:24:52 +06:00
ce452c3d11 Merge pull request 'Improved codebase to prepare for production' (#32) from chore/hardening into dev
Reviewed-on: wrenn/wrenn#32
2026-04-16 13:00:06 +00:00
ab034062d3 Merge branch 'dev' into chore/hardening 2026-04-16 12:58:48 +00:00
24f904fa74 Add +page.js to disable prerendering for admin capsule detail page 2026-04-16 18:38:03 +06:00
cc63ed2197 Minor patch 2026-04-16 18:14:50 +06:00
9c4fea93bc Added host preparation script and updated claude md 2026-04-16 16:56:04 +06:00
977c3a466a Shrink minimal rootfs on graceful host agent shutdown
On startup EnsureImageSizes expands the minimal rootfs to the configured
disk size. This adds the inverse: ShrinkMinimalImage runs e2fsck + resize2fs -M
during graceful shutdown so the image is stored compactly on disk.
2026-04-16 16:26:50 +06:00
e6e3975426 Add unauthenticated /health endpoint to control plane
Returns JSON with status and build version for monitoring and
load balancer health checks.
2026-04-16 16:13:42 +06:00
bba5f80294 Add production file logging with logrotate support
Both control plane and host agent now write structured slog output to
$WRENN_DIR/logs/ in addition to stderr. Log level is configurable via
LOG_LEVEL env var (default: info). SIGHUP reopens the log file so
logrotate can rotate without copytruncate.
2026-04-16 15:09:26 +06:00
44c32587e3 Cap network slot allocator at 32767 to match veth IP space
The veth addressing uses 10.12.0.0/16 with 2 IPs per slot. At slot
index 32768, vethOffset=65536 overflows byte arithmetic and wraps back
to 10.12.0.0, causing silent IP collisions with existing sandboxes.
Cap the allocator at 32767, which is the actual addressable limit.
2026-04-16 14:57:44 +06:00
b9aa444472 Merge pull request 'Bug fixes and optimizations' (#31) from fix/optimizations into dev
Reviewed-on: wrenn/wrenn#31
2026-04-16 00:39:47 +00:00
fb4b67adb3 Destroy owned sandboxes on user disable and fix OAuth login resilience
When an admin disables a user, all active sandboxes (running, paused,
hibernated) for teams they own are now destroyed and their API keys
are deleted. User queries now filter by status column instead of
deleted_at, so re-enabling a user always works. OAuth login paths
use ensureDefaultTeam to auto-create a team if the user has none,
matching the email/password login behavior.
2026-04-16 06:37:51 +06:00
9ea847923c Fix concurrency, security, and correctness issues across backend and frontend
- C1: Add sync.RWMutex to vm.Manager to protect concurrent vms map access
- H1: Fix IP arithmetic overflow in network slot addressing (byte truncation)
- H5: Fix MultiplexedChannel.Fork() TOCTOU race (move exited check inside lock)
- H8: Remove snapshot overwrite — return template_name_taken conflict instead
- H9: Wrap DeleteAccount DB ops in a transaction, make team deletion fatal
- H10: Sanitize serviceErrToHTTP to stop leaking internal error messages
- H11: Add deleted_at IS NULL to GetUserByEmail/GetUserByID queries
- H12: Add id DESC to audit log composite index for cursor pagination
- H15: Delete dead AuthModal.svelte component
- H17: Move JWT from WebSocket URL query param to first WS message
- H18: Fix $derived to $derived.by in FilesTab breadcrumbs
2026-04-16 06:11:42 +06:00
ed2222c80c Move sidebar into layout files and fix timer cleanup across frontend
Sidebar and AdminSidebar were re-instantiated on every page navigation
(17 pages total), causing unnecessary DOM teardown/rebuild and redundant
localStorage reads. Now each lives in its respective +layout.svelte as a
single persistent instance.

Also adds onDestroy cleanup for leaked timers (settings, team, login RAF
loop) and CSS containment on <main> to isolate layout recalculations.
2026-04-16 05:34:47 +06:00
e91109d69c Fix API key cleanup on user deactivation and build archive race condition
Delete all API keys created by a user when their account is disabled,
deleted, or soft-deleted. Store build archives before enqueuing to Redis
so workers never dequeue a build with missing files.
2026-04-16 05:29:02 +06:00
451d0819cc Merge pull request 'Added settings for users and proper email flow for authentication' (#30) from feat/user-onboarding into dev
Reviewed-on: wrenn/wrenn#30
2026-04-15 22:45:30 +00:00
084c6caa7d Redirect authenticated users away from login page 2026-04-16 04:30:25 +06:00
43e838c55c Fix cascading deletion gaps for user and team cleanup
- Add ON DELETE CASCADE to users_teams, oauth_providers, admin_permissions
  and ON DELETE SET NULL (with nullable columns) to team_api_keys.created_by,
  hosts.created_by, host_tokens.created_by so HardDeleteExpiredUsers no longer
  fails with FK violations
- User account deletion now cascades to sole-owned teams via DeleteTeamInternal,
  preventing orphaned teams with live sandboxes after account removal
- ListActiveSandboxesByTeam now includes hibernated sandboxes so their disk
  snapshots are cleaned up during team deletion
- Team soft-delete now hard-deletes sandbox metric points, metric snapshots,
  API keys, and channels to prevent data accumulation on deleted teams
- Extract deleteTeamCore() to deduplicate shared logic across DeleteTeam,
  AdminDeleteTeam, and DeleteTeamInternal
- Fix ListAPIKeysByTeamWithCreator to use LEFT JOIN after created_by became
  nullable, and update handler to read pgtype.Text.String for creator_email
2026-04-16 04:26:48 +06:00
e1b23f3d79 Updated claude md with better design 2026-04-16 04:22:30 +06:00
a3f75300a9 Add email activation flow and replace is_active with status column
Email signup now creates inactive users who must activate via a 30-minute
email token before signing in. Team creation is deferred to first login
after activation, while OAuth users continue to get teams immediately.

- Replace boolean is_active with status column (inactive/active/disabled/deleted)
- Add POST /v1/auth/activate endpoint with Redis-backed token consumption
- Signup returns message instead of JWT, sends activation email
- Login differentiates error messages by user status
- Add confirm password field to signup form
- Add /activate frontend page that auto-logs in on success
- Handle inactive user cleanup on re-signup (30-min cooldown) and OAuth collision
2026-04-16 04:05:41 +06:00
e8a2217247 Add settings page, forgot/reset password flows, and me API client
Adds /dashboard/settings route with profile/password/OAuth/account-deletion
management. Adds /forgot-password and /reset-password routes. Enables sidebar
settings link. Adds typed me.ts API client.
2026-04-16 03:25:03 +06:00
93e6fe8160 Add Wrenn wordmark to email template and improve spacing 2026-04-16 03:24:59 +06:00
f69fa8cded Add /v1/me account management endpoints
Adds self-service endpoints: GET/PATCH/DELETE /v1/me, POST /v1/me/password,
POST /v1/me/password/reset{/confirm}, GET/DELETE /v1/me/providers/{provider}.
Includes OAuth account-linking flow via cookie, hard-delete cleanup goroutine
(24h ticker, 15-day grace period), and OpenAPI spec for all new routes.
2026-04-16 03:24:55 +06:00
bc8348b199 Add DB queries for account self-service
New queries: UpdateUserPassword, SoftDeleteUser, HardDeleteExpiredUsers,
CountUserOwnedTeamsWithOtherMembers, GetOAuthProvidersByUserID, DeleteOAuthProvider.
2026-04-16 03:24:42 +06:00
81715947bb Updated claude md 2026-04-16 02:08:03 +06:00
d705f83b68 Removed unnecessary files and renamed minimal update script 2026-04-16 02:06:39 +06:00
2f0e7fcdc2 Merge pull request 'Added transactional email sending' (#29) from feat/email-transaction into dev
Reviewed-on: wrenn/wrenn#29
2026-04-15 18:56:57 +00:00
970ae2b6b2 Updated email template for optional name 2026-04-16 00:54:38 +06:00
ded9c15f06 minor changes 2026-04-16 00:54:20 +06:00
9d68eb5f00 Add transactional email system via SMTP
Introduce internal/email package with SMTP sending, embedded HTML/text
templates, and multipart MIME assembly. Emails use a generic EmailData
struct (recipient name, message, optional button, optional closing) so
new email types can be added without code changes.

Wired into signup (welcome email), team creation, and team member
addition. No-op mailer when SMTP_HOST is not configured.
2026-04-16 00:46:08 +06:00
700512b627 Updated letter-spacing 2026-04-15 22:38:19 +06:00
d1975089f1 Merge pull request 'Added metadata tracking for binaries and refactored to maintain a separate cloud version' (#28) from feat/meta-versioning into dev
Reviewed-on: wrenn/wrenn#28
2026-04-15 15:44:20 +00:00
a5ad3731f2 Refactored to maintain a separate cloud version
Moves 12 packages from internal/ to pkg/ (config, id, validate, events, db,
auth, lifecycle, scheduler, channels, audit, service) so they can be imported
by the enterprise repo as a Go module dependency.

Introduces pkg/cpextension (shared Extension interface + ServerContext) and
pkg/cpserver (Run() entrypoint with functional options) so the enterprise
main.go can call cpserver.Run(cpserver.WithExtensions(...)) without duplicating
the 20-step server bootstrap. Adds db/migrations/embed.go for go:embed access
to OSS SQL migrations from the enterprise module.

cmd/control-plane/main.go is reduced to a 10-line wrapper around cpserver.Run.
2026-04-15 21:41:48 +06:00
11d746dcfc Merge pull request 'Fixed issues with code interpreter' (#27) from fix/code-interpreter into dev
Reviewed-on: wrenn/wrenn#27
2026-04-15 12:56:18 +00:00
5f877afb9e Remove PTY inactivity timeout to keep terminal sessions alive indefinitely
Sessions now only end on process exit or explicit kill, not idle time.
The keepalive ping every 30s remains to prevent network-level disconnects.
2026-04-15 18:31:48 +06:00
5b4fde055c Fix build recipe execution and flatten reliability
- Set HOME in bctx.EnvVars when USER switches so ~ expands correctly in
  subsequent RUN/WORKDIR steps instead of resolving to /root
- Run /bin/sync inside the guest before FlattenRootfs destroys the VM,
  preventing pip-installed files from being captured as 0-byte due to
  unflushed page cache
- Wrap healthcheck command with su <user> so it runs with the template's
  default user context (correct HOME, correct UID)
- Export Shellescape from the recipe package for use in build service
- Add code-runner-beta recipe (Jupyter server with ipykernel --sys-prefix)
  and replace old python-interpreter-v0-beta
2026-04-15 18:24:54 +06:00
59507d7553 Merge pull request 'Added teams and users pages to admin panel' (#26) from feat/admin-panel into dev
Reviewed-on: wrenn/wrenn#26
2026-04-14 22:00:40 +00:00
a265c15c4d Add admin user management with is_active enforcement
Admin users page at /admin/users with paginated user list showing name,
email, team counts, role, join date, and active status toggle. Inactive
users are blocked from all authenticated endpoints immediately via DB
check in JWT middleware. OAuth login errors now show human-readable
messages on the login page.
2026-04-15 03:58:44 +06:00
d332630267 Add admin teams management page
Admin panel now includes a Teams page with paginated listing of all teams
(including soft-deleted), BYOC enable with confirmation dialog, and team
deletion with active capsule warnings. Shows member count, owner info,
active capsules, and channel count per team.
2026-04-15 03:36:37 +06:00
587f6ed8ad Merge pull request 'Implemented least-loaded host scheduler with bottleneck-first strategy' (#25) from feat/host-scheduler into dev
Reviewed-on: wrenn/wrenn#25
2026-04-14 21:03:25 +00:00
82d281b5b5 Implement least-loaded host scheduler with bottleneck-first strategy
Replace round-robin scheduling with resource-aware host selection that
picks the host with the most headroom at its tightest resource. Extends
the HostScheduler interface with memory/disk params for admission control.
2026-04-15 03:02:29 +06:00
17d5d07b3a Removed unused env vars from env example 2026-04-15 02:19:28 +06:00
71b87020c9 Remove redundant comments from login page glow animation 2026-04-14 04:32:17 +06:00
516890c49a Add background process execution API
Start long-running processes (web servers, daemons) without blocking the
HTTP request. Leverages envd's existing background process support
(context.Background(), List, Connect, SendSignal RPCs) and wires it
through the host agent and control plane layers.

New API surface:
- POST /v1/capsules/{id}/exec with background:true → 202 {pid, tag}
- GET /v1/capsules/{id}/processes → list running processes
- DELETE /v1/capsules/{id}/processes/{selector} → kill by PID or tag
- WS /v1/capsules/{id}/processes/{selector}/stream → reconnect to output

The {selector} param auto-detects: numeric = PID, string = tag.
Tags are auto-generated as "proc-" + 8 hex chars if not provided.
2026-04-14 03:57:01 +06:00
962860ba74 Pre-pause snapshot signal to prevent Go runtime crash on restore
envd crashes with "fatal error: bad summary data" after Firecracker
snapshot/restore because the page allocator radix tree is inconsistent
when vCPUs are frozen mid-allocation. The port scanner goroutine
allocates heavily every second, making it the primary trigger.

Add POST /snapshot/prepare to envd — the host agent calls it before
vm.Pause to quiesce continuous goroutines and force GC. On restore,
PostInit restarts the port subsystem via the existing /init endpoint.

- New PortSubsystem abstraction with Start/Stop/Restart lifecycle
- Context-based goroutine cancellation (replaces irreversible channel close)
- Context-aware Signal to prevent scanner/forwarder deadlock
- Fix forwarder goroutine leak (was spinning forever on closed channel)
- Kill socat children on stop to prevent orphans across snapshots
- Fix double cmd.Wait panic (exec.Command instead of CommandContext)
2026-04-13 05:21:10 +06:00
117c46a386 Fix: Auto-admin didn't work for oauth users 2026-04-13 05:00:37 +06:00
d828a6be08 Normalize dashboard page headers: add divider line and align button layout
Add consistent mt-6 border-b divider to Capsules, Metrics, and Templates
headers. Align Channels header to match Keys page pattern (items-center,
description inside the title group).
2026-04-13 04:59:40 +06:00
bbdb44afee Merge pull request 'Added manual template building' (#24) from feat/admin-panel into dev
Reviewed-on: wrenn/wrenn#24
2026-04-12 22:44:39 +00:00
784fe5c7a8 Polish admin capsule pages and improve shared components
- Admin list: remove redundant Open button, normalize with dashboard
  patterns (sorting, search highlight, auto-refresh, animations)
- Admin detail: breadcrumb header, status bar, visibility polling
- FilesTab: add treeOnly prop, compact mode uses 2/7 tree + 5/7 preview
  split, expand tree to full width when no file selected, improve copy
- MetricsPanel: hide Live badge in compact layout (redundant with status)
- DestroyDialog: accept destroyFn prop for admin capsule deletion
2026-04-13 04:41:51 +06:00
60c0de670c Extract MetricsPanel component and use it in admin capsule detail page
Moves all Chart.js metrics logic (polling, smoothing, chart init/update)
into a reusable MetricsPanel component with 'full' and 'compact' layout
modes. The admin capsule detail page now reuses MetricsPanel, TerminalTab,
and FilesTab — no duplicated code.
2026-04-13 04:16:53 +06:00
90bea52ccd Add admin capsule management, fix file browser for special files, normalize dialog styles
- Admin capsule CRUD: list, create (platform templates), get detail with
  terminal/files/metrics, snapshot, destroy
- First signup auto-promotes to platform admin
- JWT auth via query param for WebSocket connections
- File browser: handle non-regular files (devices, pipes, sockets) gracefully
  instead of showing raw backend errors
- Normalize admin template dialogs to match established dialog patterns:
  remove accent bars, unify animation/shadow/button styles
2026-04-13 04:12:36 +06:00
f920023ecf Block download for non-regular files in file browser
Disable the download button for symlinks and show a dedicated
preview pane explaining the symlink target and suggesting to
navigate to the target file instead. Guard handleDownload against
non-file types as a safety net.
2026-04-13 02:57:38 +06:00
19ddb1ab8b Normalize dialog styles across capsules and templates pages
Aligned all dialog boxes to a consistent pattern: same shadow
(--shadow-dialog), animation (fadeUp 0.2s ease), button sizing
(py-2, duration-150), and hover effects. Added template type
indicator dot to CreateCapsuleDialog combobox. Removed accent
gradient bars from templates page inline dialogs.
2026-04-13 02:48:58 +06:00
5633957b51 Explicit write when mounting rootfs for updates 2026-04-13 02:38:09 +06:00
eb47e22496 Merge pull request 'Fixed crash on non-regular files and connection leaks' (#23) from hotfix/file-browsing-error-for-dev into dev
Reviewed-on: wrenn/wrenn#23
2026-04-12 20:12:46 +00:00
b1595baa19 Updated env.example 2026-04-13 02:10:43 +06:00
da06ecb97b Fix file browser crash on non-regular files and connection leaks
- envd: reject non-regular files (devices, pipes, sockets) in GetFiles
  to prevent infinite reads from /dev/zero, /dev/urandom etc.
- host agent: add context cancellation check in ReadFileStream loop
  with proper Connect error codes
- frontend: abort in-flight file reads on file switch, directory
  navigation, and component teardown via AbortController
- frontend: guard against abort errors surfacing in UI, use try/finally
  for fileLoading state
2026-04-13 02:09:50 +06:00
0d5007089e Merge pull request 'Updated dependencies and fixed breaking changes' (#22) from fix/dependency-updates into dev
Reviewed-on: wrenn/wrenn#22
2026-04-12 18:26:57 +00:00
0e7b198768 Bump netlink v1.3.1 and netns v0.0.5
Fixes resource leaks in named namespace handlers, adds IFF_RUNNING
flag deserialization and RouteGetWithOptions.
2026-04-13 00:13:40 +06:00
9ad704c12b Update CP listen port to 9725 and public URL to app.wrenn.dev 2026-04-13 00:01:59 +06:00
0189d030bb Bump frontend and Go x/ dependencies
- vite 7→8, @sveltejs/vite-plugin-svelte 6→7, typescript 5→6
- golang.org/x/crypto v0.49→v0.50, golang.org/x/sys v0.42→v0.43 (both modules)
2026-04-13 00:01:53 +06:00
7b853a05ba Update pgx/v5 from v5.8.0 to v5.9.1
Picks up timestamp scan optimizations, ContextWatcher goroutine leak
fix, and stdlib ResetSession connection pool fix.
2026-04-12 22:50:28 +06:00
108b68c3fa Updated gitignore 2026-04-12 22:24:54 +06:00
565817273d Rename API routes /v1/sandboxes → /v1/capsules 2026-04-12 21:51:04 +06:00
ea65fb584c Merge pull request 'Completed template build for admins' (#21) from feat/admin-template-build into dev
Reviewed-on: wrenn/wrenn#21
2026-04-11 21:41:18 +00:00
25b5258841 COPY multi-source support, configurable rootfs size, build fixes
- COPY now supports multiple sources: COPY a.txt b.txt /dest/
  Last argument is always destination (matches Dockerfile semantics).
- COPY resolves relative destinations against current WORKDIR.
- WRENN_DEFAULT_ROOTFS_SIZE env var (e.g. 5G, 2Gi, 1000M, 512Mi)
  controls template rootfs expansion. Used both at agent startup
  (EnsureImageSizes) and after FlattenRootfs (shrink then re-expand).
- Pre-build now sets WORKDIR /home/wrenn-user after USER switch.
- Extracted archive files get chmod a+rX for readability.
- Path traversal validation on COPY sources.
2026-04-12 03:39:17 +06:00
46c43b95c2 Visual polish 2026-04-12 02:44:40 +06:00
000318f77e Fix runtime env leaking into templates, add hostname to /etc/hosts
- Filter out user-specific env vars (HOME, USER, LOGNAME, SHELL, etc.)
  from template default_env so they don't override envd's per-user
  resolution. Fixes bash sourcing /root/.bashrc as wrenn-user.
- Keep WRENN_SANDBOX (legitimate runtime flag), only filter per-sandbox
  IDs (WRENN_SANDBOX_ID, WRENN_TEMPLATE_ID).
- Add "127.0.0.1 sandbox" to /etc/hosts in wrenn-init.sh so sudo can
  resolve the hostname. Fixes "unable to resolve host sandbox" error.
- Move capsule lifecycle buttons (Pause/Resume/Snapshot/Destroy) to the
  same row as Stats/Files/Terminal tabs.
- Show vCPU/Memory for all template types with Required/Recommended
  tooltips on the user templates page.
2026-04-12 02:43:09 +06:00
f5eeb0ffcc Rename /dashboard/snapshots to /dashboard/templates, show specs for all template types
- Rename snapshots route to templates for consistency with sidebar label
- Show vCPU and Memory values for base templates (not just snapshots),
  with tooltip distinguishing "Required" vs "Recommended"
- Show recipe copy button in admin build logs
- Admin panel defaults to /admin/templates on entry
- WORKDIR creates directory if not present (mkdir -p)
- Use USER command in pre-build instead of raw adduser
- Fix Svelte whitespace stripping in step keyword display
2026-04-12 02:22:43 +06:00
75af2a4f66 Add USER, COPY, ENV persistence to template build system
Implement three new recipe commands for the admin template builder:

- USER <name>: creates the user (adduser + passwordless sudo), switches
  execution context so subsequent RUN/START commands run as that user
  via su wrapping. Last USER becomes the template's default_user.

- COPY <src> <dst>: copies files from an uploaded build archive
  (tar/tar.gz/zip) into the sandbox. Source paths validated against
  traversal. Ownership set to the current USER.

- ENV persistence: accumulated env vars stored in templates.default_env
  (JSONB) and injected via PostInit when sandboxes are created from the
  template, mirroring Docker's image metadata approach.

Supporting changes:
- Pre-build creates wrenn-user as default (via USER command)
- WORKDIR now creates the directory if it doesn't exist (mkdir -p)
- Per-step progress updates (ProgressFunc callback) for live UI
- Multipart form support on POST /v1/admin/builds for archive upload
- Proto: default_user/default_env fields on Create/ResumeSandboxRequest
- Host agent: SetDefaults calls PostInitWithDefaults on envd
- Control plane: reads template defaults, passes on sandbox create/resume
- Frontend: file upload widget, recipe copy button, keyword colors for
  USER/COPY, fixed Svelte whitespace stripping in step display
- Admin panel defaults to /admin/templates instead of /admin/hosts
- Migration adds default_user and default_env to templates and
  template_builds tables
2026-04-12 02:10:01 +06:00
f6c3dc0801 Merge pull request 'bugfix: preserve agent gRPC status codes and map AlreadyExists to 409 Conflict' (#20) from bugfix/mkdir-already-exists-409 into dev
Reviewed-on: wrenn/wrenn#20
2026-04-11 17:59:16 +00:00
f5a9a1209f fix: map CodeAlreadyExists to HTTP 409 Conflict
Updated the `agentErrToHTTP` switch statement to explicitly catch
`connect.CodeAlreadyExists` (as well as
`connect.CodeFailedPrecondition`)
and return `http.StatusConflict` (409) instead of falling through to the

default 502 Bad Gateway.
2026-04-11 23:54:48 +06:00
8d0356e372 fix: stop overwriting agent gRPC errors with CodeInternal
Removed the `connect.NewError(connect.CodeInternal, ...)` wrapper in the
Server's MakeDir proxy handler. Previously, this wrapper was catching
specific agent errors (like CodeAlreadyExists) and casting them into
generic Code 13 (Internal) errors, stripping the gRPC metadata.

This change allows the control-plane to act as a transparent pipeline,
ensuring the API gateway can properly interpret and route specific
filesystem failures.
2026-04-11 23:54:23 +06:00
c3c9ced9dd Remove API key auth requirement for sandbox port proxy connections
Sandbox URLs ({port}-{sandbox_id}.{domain}) are now accessible without
authentication. The sandbox ID in the hostname is sufficient for routing.
2026-04-11 13:59:07 +06:00
7d0a21644f Merge pull request 'Visual optimizations for the web UI' (#19) from fix/optimizations into dev
Reviewed-on: wrenn/wrenn#19
2026-04-11 02:24:01 +00:00
26917d432d Add syntax highlighting to file browser, harden capsules list
File browser:
- Add shiki-based syntax highlighting (lazy-loaded, zero initial bundle
  impact) with support for 30+ languages
- Cap highlighting at 2000 lines to avoid freezing on large files
- Pre-compute preview lines as derived state instead of re-splitting
  on every render
- Add content-visibility: auto on code lines for off-screen skip
- Remove per-line CSS transitions (unnecessary paint on 5000 elements)
- Cap row entrance animations to first 30 entries

Capsules list:
- Pause auto-refresh polling when browser tab is hidden
- Add empty state for search with no results
- Fix error state not clearing on successful refresh
- Fix action menu positioning near viewport edges
- Disable create button when no template selected
2026-04-11 07:49:11 +06:00
430fb9e70e Add per-provider brand colors to channels page
Give each provider (Discord, Slack, Teams, Google Chat, Telegram,
Matrix, Webhook) its own distinctive color for badges, row hover
stripes, and dialog tags. Move channel count into the header as a
serif numeral for stronger typographic hierarchy.
2026-04-11 07:14:13 +06:00
0807946d45 Replace template text input with searchable combobox, lock specs for snapshots
Template field is now a filterable dropdown that fetches available
templates on dialog open. Selecting a snapshot auto-fills and disables
vCPU/memory inputs since they must match the original capsule config.
2026-04-11 07:00:59 +06:00
11ca6935a6 Skip row fly-transitions on template filter change to prevent visual flicker
After initial page load animations complete, subsequent filter switches
render instantly (duration: 0) instead of replaying staggered fly-in/out
transitions that caused all rows to flash before filtering took effect.
2026-04-11 06:48:50 +06:00
e2f869bfc2 Minor textual change 2026-04-11 06:23:31 +06:00
21b82c2283 Optimize frontend polling: visibility API, range-based intervals, skip redundant redraws
Adds Page Visibility API to StatsPanel, templates, and capsule detail
pages so polling pauses when the browser tab is hidden. Capsule metrics
now use range-appropriate poll intervals (10s for 5m/10m, up to 120s for
24h) instead of a flat 10s. Chart updates are skipped when the data
fingerprint hasn't changed, avoiding unnecessary Canvas redraws.
2026-04-11 06:20:29 +06:00
dbad418093 Harden channels page: deduplicate dropdowns, add missing provider logos
Consolidate three identical click-outside $effect blocks into a reusable
useClickOutside helper. Extract duplicated events checkbox list into an
eventsDropdownItems snippet shared by create and edit dialogs. Add brand
SVG icons for Teams, Google Chat, and Matrix providers.
2026-04-11 06:18:36 +06:00
2bad843069 Extract SnapshotDialog and DestroyDialog into reusable components
Add lifecycle buttons (pause, resume, snapshot, destroy) to the
individual capsule detail page and refactor both the list and detail
pages to share the new dialog components.
2026-04-11 06:08:19 +06:00
9332f4ac18 Merge pull request 'Terminal connection (PTY)' (#18) from feat/ssh-connection into dev
Reviewed-on: wrenn/wrenn#18
2026-04-10 23:45:10 +00:00
cf191ca821 Harden file browser: cap preview lines, fix race conditions, download UX
- Cap text preview at 5,000 lines with truncation footer and download link
  to prevent browser freeze on large files (300k+ DOM nodes)
- Add request generation counters to discard stale API responses from
  rapid directory/file clicking
- Guard initial $effect with hasInitiallyLoaded to prevent double-load
- Add download loading state with spinner and disabled button
- Delay URL.revokeObjectURL by 5s so browser can start download
2026-04-11 05:43:32 +06:00
d2202c4f49 Harden terminal: binary-safe base64, auto-reconnect, session limits
- Replace btoa/atob with TextEncoder/TextDecoder for binary-safe base64
  encoding — fixes crash on multi-byte UTF-8 input (emoji, CJK, accents)
- Auto-reconnect on abnormal WebSocket close while session is live
- Cap concurrent sessions at 8 with disabled "+" button at limit
- Guard all ws.send() calls with try/catch via wsSend() wrapper
- Clean up input flush timer on session close and component destroy
- Close all sessions when capsule stops running (isRunning → false)
- Clean up orphaned display entry if DOM container fails to render
2026-04-11 05:35:53 +06:00
1826af37a5 Increase multiplexer fork buffer to 4096 to prevent output drops
64-entry buffer was too small for high-throughput PTY output (e.g.
ls -laihR /). The consumer couldn't drain fast enough over the RPC
stream, causing the non-blocking send fallback to silently discard
data. 4096 entries (~64MB at 16KB/chunk) handles sustained output
without drops while still preventing deadlock on stuck consumers.
2026-04-11 05:16:43 +06:00
acc721526d Polish terminal tab: merge status bar into tab strip, normalize sizing
- Merge separate status bar into unified tab bar (one row of chrome instead of two)
- Bump font/button/icon sizes to match rest of capsule page
- VS Code-style tab separators with intelligent hiding around active tab
- Hide tab bar when no sessions exist (empty state has its own CTA)
- Fix xterm background gaps by painting viewport/screen backgrounds
- Increase terminal font from 13px to 14px
2026-04-11 05:10:46 +06:00
4b2ff279f7 Add terminal tab to capsule detail page and fix envd process lookup bugs
- Add multi-session Terminal tab with xterm.js (session tabs, close, reconnect)
- Keep terminal mounted across tab switches to preserve sessions
- Persist active tab in URL (?tab=terminal) so refresh stays on terminal
- Buffer keystrokes (50ms) to reduce per-character RPC overhead
- Add WebSocket auth via ?token= query param for browser WS connections
- Enable ws:true in Vite dev proxy for WebSocket support

envd fixes (pre-existing bugs exposed by multi-session terminals):
- Fix getProcess tag Range: inverted return values caused early stop when
  multiple tagged processes existed, making SendInput fail with "not found"
- Fix multiplexer deadlock: blocking send to cancelled fork's unbuffered
  channel prevented process cleanup. Now uses buffered channels (cap 64)
  with non-blocking fallback
2026-04-11 04:27:16 +06:00
ab3fc4a807 Add interactive PTY terminal sessions for sandboxes
Wire envd's existing PTY process capabilities through the full stack:
hostagent proto (4 new RPCs: PtyAttach, PtySendInput, PtyResize, PtyKill),
envdclient, sandbox manager, and a new WebSocket endpoint at
GET /v1/sandboxes/{id}/pty with bidirectional JSON message protocol.

Sessions use tag-based identity for disconnect/reconnect support,
base64-encoded PTY data for binary safety, and a 120s inactivity timeout.
2026-04-11 02:42:59 +06:00
09f030d202 Replace file browser not-running state with centered empty state
The small bordered card looked broken and misaligned — now uses a
full-width centered layout with floating icon, matching the app's
empty-state pattern.
2026-04-10 23:32:17 +06:00
43c15c86de Merge pull request 'Added browser based filesystem interactions' (#16) from feat/file-interactions into dev
Reviewed-on: wrenn/wrenn#16
2026-04-10 13:40:39 +00:00
851f54a9e1 Polish file browser: add up button, normalize design, improve UX
Add parent directory button in breadcrumb bar, remove redundant ..
row from file list. Normalize styles to use design system tokens
(accent glow, iconFloat, fadeUp). Improve empty states, add staggered
row entrance animation, file extension badge, and clearer UX copy.
2026-04-10 19:24:24 +06:00
4ed17b2776 Fix stale WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID after snapshot restore
After restoring a VM from snapshot, envd had already completed its initial
MMDS poll, so the metadata files in /run/wrenn/ and env vars retained values
from the original sandbox. Call POST /init after WaitUntilReady on both
resume and create-from-template paths to trigger envd to re-read MMDS.
2026-04-10 19:23:48 +06:00
0e6daaabe0 Fix file browser: use ~ as default path, support tilde expansion
- Default to ~ instead of hardcoded /home/user — envd resolves it
  to the actual home dir of the configured user
- Pass ~ and ~/... paths through to envd for server-side expansion
- Resolve actual absolute path from response entries for breadcrumbs
- Fall back to / if home dir is empty or doesn't exist
- Fix leftover label prop on admin templates CopyButton
2026-04-10 19:10:20 +06:00
82531b735c Add Files tab to capsule detail page with file browser and preview
Implements a split-panel file browser: directory tree on the left with
path input and breadcrumb navigation, file preview on the right with
line numbers. Binary/large files (>10MB) show a download prompt instead.

Also adds CopyButton component across capsule, snapshot, and template
pages, and fixes pre-existing type errors in StatsPanel and admin
templates page.
2026-04-10 18:43:11 +06:00
c9283cac70 Add filesystem operations (list, mkdir, remove) across full stack
Plumb ListDir, MakeDir, and RemovePath through all layers:
REST API → host agent RPC → envdclient → envd. These endpoints
enable a web file browser for sandbox filesystem interaction.

New endpoints (all under requireAPIKeyOrJWT):
- POST /v1/sandboxes/{id}/files/list
- POST /v1/sandboxes/{id}/files/mkdir
- POST /v1/sandboxes/{id}/files/remove
2026-04-10 18:05:13 +06:00
c1987b0bda Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-04-10 03:03:04 +06:00
2b31af8fde Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-04-10 02:50:50 +06:00
831c898b71 Merge pull request 'Added channels for external notifications' (#13) from feat/channels into dev
Reviewed-on: wrenn/sandbox#13
2026-04-09 19:20:36 +00:00
0f78982186 feat: channel audit logging, name cleaning, message formatting, and dashboard UI
- Add audit log entries for channel create, update, rotate_config, delete
- Clean channel names on create/update (trim, lowercase, spaces → hyphens,
  SafeName validation)
- Format chat notifications with full event details (resource, actor, team,
  timestamp) instead of one-liners
- Fix Discord split-line embeds by setting splitLines=No on shoutrrr URL
- Add channels dashboard page and sidebar navigation
2026-04-10 01:17:03 +06:00
84dd15d22b feat: add notification channels with provider integrations and retry
Implement a channels system for notifying teams via external providers
(Discord, Slack, Teams, Google Chat, Telegram, Matrix, webhook) when
lifecycle events occur (capsule/template/host state changes).

- Channel CRUD API under /v1/channels (JWT-only auth)
- Test endpoint to verify config before saving (POST /v1/channels/test)
- Secret rotation endpoint (PUT /v1/channels/{id}/config)
- AES-256-GCM encryption for provider secrets (WRENN_ENCRYPTION_KEY)
- Redis stream event publishing from audit logger
- Background dispatcher with consumer group and retry (10s, 30s)
- Webhook delivery with HMAC-SHA256 signing (X-WRENN-SIGNATURE)
- shoutrrr integration for chat providers
- Secrets never exposed in API responses
2026-04-09 17:06:06 +06:00
5148b5dd64 Updated CLAUDE.md 2026-04-09 14:28:39 +06:00
37d85ec998 chore: relicense from BSL 1.1 to Apache 2.0
Replace Business Source License with Apache License Version 2.0 across
LICENSE, envd/LICENSE, and NOTICE. Update NOTICE to remove BSL-era
framing that singled out Apache-only portions.
2026-04-09 14:28:19 +06:00
e2beef817d Expose host up/down audit events to BYOC teams and refresh dashboard navigation
Change host marked_down/marked_up audit log scope from "admin" to "team" so
BYOC team members can see when their hosts go unreachable or recover. Rename
BYOC sidebar entry to Hosts, add placeholder billing/usage pages, disable
unimplemented notifications/settings links, and point docs to external site.
2026-04-09 14:24:20 +06:00
a9ca13b238 Changed redis dependency to keydb 2026-04-09 00:47:19 +06:00
e3ffa576ce Fix review findings: IP collision, pause race, proxy path, ENV ordering, conn drain
- Fix IP address collision at slot 32768+ by using bitwise shifts instead of
  byte-truncating division in network slot addressing
- Add per-sandbox lifecycleMu to serialize concurrent Pause/Destroy calls
- Sanitize proxy forwarding path with path.Clean
- Sort ENV keys in recipe shell preamble for deterministic ordering
- Fix ConnTracker goroutine leak by adding cancel channel to Drain/Reset
- Update context_test to assert deterministic ENV ordering
2026-04-08 04:32:41 +06:00
dd50cfdcb1 fix: security hardening from CSO audit
- Add auth failure logging (login, API key, JWT) with IP/email/prefix
- Move OAuth JWT from URL params to short-lived cookies to prevent
  token leakage via browser history, server logs, and Referer headers
- Pin Swagger UI to v5.18.2 with SRI integrity hashes
- Upgrade Go toolchain to 1.25.8 (fixes 5 called stdlib vulns)
- Fix unchecked error in host agent credential refresh
- Add .gstack to .gitignore for security report artifacts
2026-04-08 03:46:31 +06:00
3675ecba65 chore: add gstack skill routing rules to CLAUDE.md 2026-04-08 02:28:02 +06:00
c8615466be Enforce mandatory mTLS for CP↔agent communication
Both the control plane and host agent now refuse to start without valid
mTLS configuration, closing the unauthenticated proxy/RPC attack surface
that existed when running in plain HTTP fallback mode.
2026-04-08 02:25:43 +06:00
2737288a2b Merge pull request 'Changes for a python code interpreter' (#12) from feat/python-code-interpreter into dev
Reviewed-on: wrenn/sandbox#12
2026-04-07 20:18:06 +00:00
0ea0e7cc70 Fix expandEnv regex, init script crash, healthcheck deadline, and test issues
- Fix envRegex: remove spurious (\$)? group that swallowed $$$, handle ${}
- wrenn-init.sh: add || true to networking commands under set -e, remove dead code
- waitForHealthcheck: use context deadline for unlimited retries instead of implicit 100 cap
- Make parseSandboxEnv a package-level function (unused receiver)
- Fix WrappedCommand test: map iteration order dependency, pre-expand env values
- Fix error wrapping: %v → %w per project conventions
- test-jupyter-kernel.py: move import to top-level, fix misleading comment
2026-04-08 02:14:53 +06:00
11e08e5b96 Merge branch 'dev' into feat/python-code-interpreter 2026-04-07 19:35:55 +00:00
4dc8cc3867 Removed incorrect example cert format 2026-04-07 19:35:26 +00:00
9852f96127 Modified expandEnv to use regex.
Updated recipefile with test script to check code execution with state
management
2026-04-07 22:56:56 +06:00
bf05677bef Merge branch 'dev' into feat/python-code-interpreter 2026-04-06 20:45:54 +00:00
4f340b8847 feat: add env expansion, sandbox env fetching, and configurable
healthchecks

Fix ENV instructions to expand $VAR references at set time using the
current env state, preventing self-referencing values like
PATH=/opt/venv/bin:$PATH from producing recursive expansions. Remove
expandEnv from shellPrefix to avoid double expansion.

Fetch sandbox environment variables via `env` before recipe execution
so ENV steps resolve against actual runtime values from the base
template image.

Replace hardcoded healthcheck timing with a Dockerfile-like flag parser
supporting --interval, --timeout, --start-period, and --retries. Add
start-period grace window and bounded retry counting to
waitForHealthcheck.

Add python-interpreter-v0-beta recipe and healthcheck files.
2026-04-07 01:15:43 +06:00
f57fe85492 Merge pull request 'Minor temporary fix for sitewide metrics' (#11) from patch/analytics into dev
Reviewed-on: wrenn/sandbox#11
2026-04-04 07:11:49 +00:00
9a52b47786 Minor temporary fix for sitewide metrics 2026-04-04 13:11:18 +06:00
ab38c8372c Merge pull request 'Feature: HTTP communication with sandbox' (#10) from code-interpreter into dev
Reviewed-on: wrenn/sandbox#10
2026-04-02 17:41:07 +00:00
8b5fa3438e Replace gopsutil port scanner with direct /proc/net/tcp reading
The envd port scanner used gopsutil's net.Connections() which walks
/proc/{pid}/fd to enumerate socket inodes. This corrupts Go runtime
semaphore state when the VM is paused mid-operation and restored from
a Firecracker snapshot.

Replace with a direct /proc/net/tcp + /proc/net/tcp6 parser that reads
a single file per address family — no /proc/{pid}/fd walk, no goroutines,
no WaitGroups. Also replace concurrent-map (smap) in the scanner with a
plain sync.RWMutex-protected map, since concurrent-map's Items() spawns
goroutines with a WaitGroup internally, which is equally unsafe across
snapshot boundaries.

Use socket inode instead of PID for the port forwarding map key, since
inode is available directly from /proc/net/tcp without the fd walk.
2026-04-01 15:47:28 +06:00
2b4c5e0176 Add pre-pause proxy connection drain and sandbox proxy caching
Introduce ConnTracker (atomic.Bool + WaitGroup) to track in-flight proxy
connections per sandbox. Before pausing a VM, the manager drains active
connections with a 2s grace period, preventing Go runtime corruption
inside the guest caused by stale TCP state surviving Firecracker
snapshot/restore.

Also add:
- AcquireProxyConn on Manager for atomic lookup + connection tracking
- Proxy cache (120s TTL) on CP SandboxProxyWrapper with single-query
  DB lookup (GetSandboxProxyTarget) to avoid two round-trips
- Reset() on ConnTracker to re-enable connections if pause fails
2026-04-01 15:09:44 +06:00
377e856c8f Fix lint warnings: drop deprecated Name field from snapshot response, check errcheck in benchmark
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 21:28:57 +06:00
948db13bed Add skip_pre_post build option, cancel endpoint, and recipe package
- skip_pre_post flag on builds bypasses apt update/clean pre/post steps for
  faster iteration when the recipe handles its own environment setup
- POST /v1/admin/builds/{id}/cancel endpoint marks an in-progress build as
  cancelled; UpdateBuildStatus now also sets completed_at for 'cancelled'
- internal/recipe: typed recipe parser and executor (RUN/ENV/COPY steps)
  replacing the raw string slice approach in the build worker
- pre/post build commands prefixed with RUN to match recipe step format
2026-03-30 21:24:52 +06:00
25ce0729d5 Add mTLS to CP→agent channel
- Internal ECDSA P-256 CA (WRENN_CA_CERT/WRENN_CA_KEY env vars); when absent
  the system falls back to plain HTTP so dev mode works without certificates
- Host leaf cert (7-day TTL, IP SAN) issued at registration and renewed on
  every JWT refresh; fingerprint + expiry stored in DB (cert_expires_at column
  replaces the removed mtls_enabled flag)
- CP ephemeral client cert (24-hour TTL) via CPCertStore with atomic hot-swap;
  background goroutine renews it every 12 hours without restarting the server
- Host agent uses tls.Listen + httpServer.Serve so GetCertificate callback is
  respected (ListenAndServeTLS always reads cert from disk)
- Sandbox reverse proxy now uses pool.Transport() so it shares the same TLS
  config as the Connect RPC clients instead of http.DefaultTransport
- Credentials file renamed host-credentials.json with cert_pem/key_pem/
  ca_cert_pem fields; duplicate register/refresh response structs collapsed
  to authResponse
2026-03-30 21:24:35 +06:00
88f919c4ca Rename sandbox prefix to cl-, add MMDS metadata, fix proxy port routing
- Change sandbox ID prefix from sb- to cl- (capsule) throughout
- Fix proxy URL regex character class: base36 uses 0-9a-z, not just hex
- Add MMDS V2 config and metadata to VM boot flow so envd can read
  WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID from inside the guest
- Pass TemplateID through VMConfig into both fresh and snapshot boot paths
2026-03-30 17:12:05 +06:00
8f06fc554a Replace Full snapshot fallback with file-level diff merge
Always use Firecracker Diff snapshots (fast, only changed pages) and
merge diff files at the file level when the generation cap is reached.
The previous approach used Firecracker's Full snapshot type which dumps
all memory to disk and can timeout, losing all snapshot data on failure.

Add snapshot.MergeDiffs() which reads each block from the appropriate
generation's diff file via the header mapping and writes them into a
single consolidated file with a fresh generation-0 header.
2026-03-29 02:33:33 +06:00
1ca10230a9 Prefix network namespaces with wrenn-, add stale cleanup, lower diff cap
Rename ns-{idx} to wrenn-ns-{idx} and veth-{idx} to wrenn-veth-{idx}
to avoid collisions with other tools. Add CleanupStaleNamespaces() at
agent startup to remove orphaned namespaces, veths, iptables rules, and
routes from a previous crash. Lower maxDiffGenerations from 10 to 8 to
prevent Go runtime memory corruption from snapshot/restore drift.
2026-03-29 02:14:30 +06:00
46d60fc5a5 Seed minimal template in DB and protect it from deletion
Insert a minimal template row (all-zeros UUID) so it appears in both
team and admin template listings. Guard delete endpoints to prevent
removal of the minimal template.
2026-03-29 01:34:54 +06:00
906cc42d13 Rename AGENT_*/CP_LISTEN_ADDR env vars to WRENN_* prefix
AGENT_FILES_ROOTDIR → WRENN_DIR, AGENT_LISTEN_ADDR → WRENN_HOST_LISTEN_ADDR,
AGENT_CP_URL → WRENN_CP_URL, AGENT_HOST_INTERFACE → WRENN_HOST_INTERFACE,
CP_LISTEN_ADDR → WRENN_CP_LISTEN_ADDR. Consolidates all env vars under a
consistent WRENN_ namespace.
2026-03-29 00:30:20 +06:00
75b28ed899 Add UUID-based template IDs and team-scoped template directory layout
Introduces internal/layout package for centralized path construction,
migrates templates from name-based TEXT primary keys to UUID PKs with
team-scoped directories (WRENN_DIR/images/teams/{team_id}/{template_id}).
The built-in minimal template uses sentinel zero UUIDs. Proto messages
carry team_id + template_id alongside deprecated template name field.
Team deletion now cleans up template files across all hosts.
2026-03-29 00:30:10 +06:00
03e96629c7 Remove slug from team page UI 2026-03-28 20:45:57 +06:00
34af77e0d8 Fix snapshot race, delete auth, sparse dd, default disk to 5GB
Snapshot race fix:
- Pre-mark sandbox as "paused" in DB before issuing CreateSnapshot and
  PauseSandbox RPCs, preventing the reconciler from marking it "stopped"
  during the flatten window when the sandbox is gone from the host
  agent's in-memory map but DB still says "running"
- Revert status to "running" on RPC failure
- Check ctx.Err() before writing response to avoid writing to dead
  connections when client disconnects during long snapshot operations

Delete auth fix:
- Block non-admin deletion of platform templates (team_id = all-zeros)
  at DELETE /v1/snapshots/{name} with 403, preventing file deletion
  before the team ownership check fails

Sparse dd:
- Add conv=sparse to dd in FlattenSnapshot so flattened images preserve
  sparseness (~200MB actual vs 5GB logical)

Default disk size:
- Change default disk_size_mb from 20GB to 5GB across migration,
  manager, service, build, and EnsureImageSizes
- Disable split-button dropdown arrow for platform templates in
  dashboard snapshots page (teams cannot delete platform templates)
2026-03-28 14:30:18 +06:00
c89a664a37 Switch API ID format from UUID to base36 for compact, E2B-style IDs
DB stays native UUID; the format/parse layer now encodes 16 UUID bytes
as 25-char lowercase alphanumeric (base36) strings instead of the
standard 36-char hex-with-dashes format. e.g. sb-2e5glxi4g3qnhwci95qev0cg0
2026-03-27 00:53:51 +06:00
3509ca90e8 Add pre/post build stages, fix exec timeout, expand guest PATH
Build phases:
- Pre-build (apt update) and post-build (apt clean, autoremove, rm lists)
  run with 10-minute timeout; user recipe commands keep 30s timeout
- Log entries include phase field for UI grouping
- Always send explicit TimeoutSec to host agent (0 defaulted to 30s)

Frontend:
- Pre-build/post-build steps show phase label without exposing commands
- Recipe steps numbered independently starting from 1

Guest PATH:
- Add /usr/games:/usr/local/games to wrenn-init.sh PATH export
  (standard Ubuntu paths, needed for packages like cowsay)
2026-03-27 00:28:32 +06:00
c8acac92cc Add pre/post build stages to template builds
Pre-build: apt update
Post-build: apt clean, apt autoremove, rm apt lists

Total steps count includes pre/post commands for accurate progress bars.
2026-03-27 00:00:48 +06:00
5cb37bf2a0 Add admin template deletion with broadcast to all hosts
- DELETE /v1/admin/templates/{name} endpoint (admin-only)
- Broadcasts DeleteSnapshot RPC to all online hosts before removing DB record
- Frontend admin templates page uses deleteAdminTemplate() instead of
  team-scoped deleteSnapshot()
- Delete button shown for all template types, not just snapshots
2026-03-26 23:53:08 +06:00
c0d6381bbe Add disk_size_mb, auto-expand base images, admin templates endpoint
Disk sizing:
- Add disk_size_mb column to sandboxes table (default 20480 = 20GB)
- Add disk_size_mb to CreateSandboxRequest proto, passed through the
  full chain: service → RPC → host agent → sandbox manager → devicemapper
- devicemapper.CreateSnapshot takes separate cowSizeBytes param so the
  sparse CoW file can be sized independently from the origin
- EnsureImageSizes() runs at host agent startup: expands any base image
  smaller than 20GB via truncate + resize2fs (sparse, no extra physical
  disk). Sandboxes then get the full 20GB via fast dm-snapshot path
- FlattenRootfs shrinks output images with resize2fs -M so stored
  templates are compact; EnsureImageSizes re-expands on next startup

Admin templates visibility:
- Add GET /v1/admin/templates endpoint listing all templates across teams
- Frontend admin templates page uses listAdminTemplates() instead of
  team-scoped listSnapshots()
- Platform templates (team_id = all-zeros UUID) now visible to all teams:
  GetTemplateByTeam, ListTemplatesByTeam, ListTemplatesByTeamAndType
  queries include platform team_id in WHERE clause
2026-03-26 23:45:41 +06:00
4ddd494160 Switch database IDs from TEXT to native UUID
Consolidate 16 migrations into one with UUID columns for all entity
IDs. TEXT is kept only for polymorphic fields (audit_logs.actor_id,
resource_id) and template names. The id package now generates UUIDs
via google/uuid, with Format*/Parse* helpers for the prefixed wire
format (sb-{uuid}, usr-{uuid}, etc.). Auth context, services, and
handlers pass pgtype.UUID internally; conversion to/from prefixed
strings happens at API and RPC boundaries. Adds PlatformTeamID
(all-zeros UUID) for shared resources.
2026-03-26 16:16:21 +06:00
cdd89a7cee Fix review issues: detached contexts, loop device leak, timer leak, size_bytes
- Use context.Background() with timeout in destroySandbox/failBuild so
  cleanup and DB writes survive parent context cancellation on shutdown
- Fix loop device refcount leak in FlattenRootfs when dmDevice is nil
- Replace time.After with time.NewTimer in healthcheck polling to avoid
  goroutine leak when healthcheck passes early
- Capture size_bytes from CreateSnapshot/FlattenRootfs RPC responses
  instead of hardcoding 0 in the templates table insert
- Avoid leaking internal error details to API clients in build handler
2026-03-26 15:31:38 +06:00
1ce62934b3 Add template build system with admin panel, async workers, and FlattenRootfs RPC
Introduces an end-to-end template building pipeline: admins submit a recipe
(list of shell commands) via the dashboard, a Redis-backed worker pool spins
up a sandbox, executes each command, and produces either a full snapshot
(with healthcheck) or an image-only template (rootfs flattened via a new
FlattenRootfs host-agent RPC). Build progress and per-step logs are persisted
to a new template_builds table and polled by the frontend.

Backend:
- New FlattenRootfs RPC (proto + host agent + sandbox manager)
- BuildService with Redis queue (BLPOP) and configurable worker pool (default 2)
- Admin-only REST endpoints: POST/GET /v1/admin/builds, GET /v1/admin/builds/{id}
- Migration for template_builds table with JSONB logs and recipe columns
- sqlc queries for build CRUD and progress updates

Frontend:
- /admin/templates page with Templates + Builds tabs
- Create Template dialog with recipe textarea, healthcheck, specs
- Build history with expandable per-step logs, status badges, progress bars
- Auto-polling every 3s for active builds
- AdminSidebar updated with Templates nav item
2026-03-26 15:27:21 +06:00
6898528096 Replace one-shot clock_settime with chrony for continuous guest time sync
Switch from the envd /init endpoint pushing host time via syscall to
chronyd reading the KVM PTP hardware clock (/dev/ptp0) continuously.
This fixes clock drift between init calls and handles snapshot resume
gracefully.

Changes:
- Add clocksource=kvm-clock kernel boot arg
- Start chronyd in wrenn-init.sh before tini (PHC /dev/ptp0, makestep 1.0 -1)
- Remove clock_settime logic from envd SetData and shouldSetSystemTime
- Remove client.Init() clock sync calls from sandbox manager (3 sites)
- Remove Init() method from envdclient (no longer needed)
- Simplify rootfs scripts: socat/chrony now come from apt in the container
  image, only envd/wrenn-init/tini are injected by build scripts
2026-03-26 04:47:44 +06:00
12d1e356fa Minor UI copy updates across capsules and templates pages 2026-03-26 03:58:12 +06:00
139f86bf9c Fix static build: disable prerender for dynamic capsule detail route
The [id] route cannot be prerendered at build time since IDs are unknown.
With adapter-static's index.html fallback, the route is handled client-side.
2026-03-26 02:13:12 +06:00
b0a8b498a8 WIP: Add Caddy reverse proxy for dev environment
Add Caddy to docker-compose as the single entry point on port 8000:
- localhost -> /api/* stripped and proxied to CP:8080, /* to frontend:5173
- *.localhost -> proxied to CP:8080 (sandbox proxy catch-all)
- Direct /v1/*, /auth/*, /docs routes proxied to CP

Move CP from :8000 to :8080 (its default). Caddy takes :8000.
Update .env.example, vite proxy target (kept as fallback), and Makefile
dev targets (pg_isready via docker exec, frontend binds 0.0.0.0).

This is an intermediate state — needs further work for the full code
interpreter feature.
2026-03-26 02:12:21 +06:00
4be65b0abb WIP: Add sandbox proxy catch-all to control plane
Add SandboxProxyWrapper that intercepts requests with Host headers
matching {port}-{sandbox_id}.{domain} and proxies them through the
owning host agent's /proxy endpoint.

Authentication is via X-API-Key only (no JWT). The API key's team must
own the sandbox. Export EnsureScheme from lifecycle package for reuse.

Request flow: SDK -> Caddy -> CP catch-all -> Host Agent -> sandbox VM.

This is an intermediate state — needs further work for the full code
interpreter feature.
2026-03-26 02:12:10 +06:00
f4675ebfc0 WIP: Add HTTP proxy endpoint to host agent
Add /proxy/{sandbox_id}/{port}/* handler that reverse-proxies HTTP
requests to services running inside sandbox VMs. The sandbox's host IP
(10.11.0.{idx}) is used as the upstream target.

Includes port validation (1-65535) and shared HTTP transport for
connection pooling. Supports WebSocket upgrades for protocols like
Jupyter's streaming API.

This is an intermediate state — needs further work for the full code
interpreter feature.
2026-03-26 02:12:01 +06:00
602ee470d9 WIP: Add socat injection to rootfs build scripts
Inject a statically-linked socat binary into rootfs images. envd's
port forwarder requires socat to bridge localhost-listening services
(e.g. Jupyter kernel) to the guest TAP interface.

Both scripts follow the same 3-step resolution: check rootfs, check
host, build from source (http://www.dest-unreach.org/socat/ v1.8.1.1).
Static linkage is verified before injection.

This is an intermediate state — needs further work for the full code
interpreter feature.
2026-03-26 02:11:54 +06:00
8cdf91d895 Merge pull request 'Added metrics' (#9) from metrics into dev
Reviewed-on: wrenn/sandbox#9
2026-03-25 16:40:06 +00:00
ed7880bc6c Add per-capsule stats detail page with live CPU/RAM charts
- New detail page at /dashboard/capsules/[id] with Stats and Files tabs
- Stats tab shows capsule info card (status, template, CPU, memory, disk,
  started, idle timeout) and two stacked Chart.js charts with live values
- Metrics API client with 10s polling and moving-average smoothing
- Capsule ID in list table is now a clickable link to the detail page
- Layout breadcrumb header (Capsules > sb-xxx) with back navigation
- Fix metrics sampler: use v.PID() directly as Firecracker PID since
  unshare -m execs (not forks) through the bash/ip-netns-exec/firecracker
  chain, so all share the same PID. Removes unused findChildPID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:31:05 +06:00
27ff828e60 Push GetSandboxMetricPoints time filter into SQL
The query was fetching all rows for a (sandbox_id, tier) pair and
filtering by timestamp in Go. For repeatedly-paused sandboxes the
24h tier can accumulate up to 30 days of data, causing up to 120x
over-fetching for a 6h range request.

Add AND ts >= $3 to the query so Postgres filters on the primary key
(sandbox_id, tier, ts) directly. Drop the redundant Go-side loop.
2026-03-25 21:53:19 +06:00
6eacf0f735 Fix LIKE pattern injection in user email search
Escape LIKE metacharacters (% and _) in the email prefix before passing
to the SQL query, and enforce the documented '@' requirement to prevent
broad user enumeration. Move search logic out of TeamService into
usersHandler since it is a site-wide lookup, not team-scoped.
2026-03-25 21:53:09 +06:00
88cb24bb86 Minor improvement 2026-03-25 21:27:11 +06:00
49b0b646a8 Add 5m, 1h, 6h, 12h range filters to metrics endpoint
Maps each user-facing range to the appropriate underlying ring buffer
tier and applies a time cutoff filter. No new ring buffers needed —
5m/10m read from the 10m tier, 1h/2h from the 2h tier, 6h/12h/24h
from the 24h tier.
2026-03-25 20:44:28 +06:00
9acdbb5ae9 Add per-sandbox CPU/memory/disk metrics collection
Samples /proc/{fc_pid}/stat (CPU%), /proc/{fc_pid}/status (VmRSS), and
stat() on CoW files at 500ms intervals per running sandbox. Three tiered
ring buffers downsample into 30s and 5min averages for 10min/2h/24h
retention. Metrics are flushed to DB on pause (all tiers) and destroy
(24h only). New GetSandboxMetrics and FlushSandboxMetrics RPCs on the
host agent, proxied through GET /v1/sandboxes/{id}/metrics?range= on
the control plane. Returns live data for running sandboxes, DB data for
paused, and 404 for stopped.
2026-03-25 20:10:33 +06:00
7473c15f52 Bugfix: cgroup2 related error inside the sandbox 2026-03-25 19:45:57 +06:00
8d5ba3873a Fix capsules table blink on background poll refresh
Poll fetches now silently update data without triggering loading
states, spinner animations, or row fadeUp re-animations. Only manual
refresh shows the spin indicator.
2026-03-25 19:44:13 +06:00
b0e6f5ffb3 Bolder stats page layout with stronger visual hierarchy
- Accent stripes: 3px → 5px; indicator dots: 6px → 8px
- Peak values step down to text-[1.714rem]/text-secondary so Now values read as the clear hero
- Now labels: semibold + uppercase for weight parity with the metric
- Cell padding py-5 → py-6; outer gap-7/pt-4 → gap-8/pt-6 for breathing room
- Chart fills: 7-8% → 11-13% opacity; lines: 1.5 → 2px
- Tick labels brighter (#635f5c), grid lines slightly more visible
- Running capsules chart: min-height 220 → 260px
2026-03-25 18:18:04 +06:00
a69b0f579c Split CPU and RAM into separate side-by-side charts
CPU (vCPUs) and RAM (GB) use different units and scales, so combining
them on a dual-axis chart was misleading. Each now has its own chart
card, laid out side-by-side.
2026-03-25 16:39:25 +06:00
45793e181c Move metrics to after templates in sidebar nav 2026-03-25 16:08:38 +06:00
e3750f79f9 Fix metrics sampler to record zero-value snapshots when idle
SampleSandboxMetrics previously filtered WHERE status IN ('running',
'starting', 'paused'), which returned no rows when all capsules were
stopped. This caused zero snapshots to be skipped, leaving the
time-series charts with no trailing data points instead of showing
the expected zero values.

Remove the WHERE filter so the query groups by all teams that have
any sandbox row. The per-status FILTER clauses on the aggregates
already produce correct zero counts for stopped capsules.

Also includes the per-VM RAM ceiling formula change (sum(ceil(each/2))
instead of ceil(sum/2)).
2026-03-25 15:50:19 +06:00
930da8a578 Move metrics to dedicated nav item, simplify capsules page
- Add Metrics nav item to sidebar with bar chart icon
- Create /dashboard/metrics page wrapping StatsPanel
- Remove tabs from capsules page (list is now the only view)
- Flatten capsules route: /capsules directly shows the list,
  removing the /list and /stats sub-routes
- Strip redundant title/subtitle from StatsPanel (page header
  provides context)
2026-03-25 15:24:21 +06:00
47b0ed5b52 Fix metrics correctness, redesign stats page
- Replace stale snapshot read (GetCurrentMetrics) with live query
  (GetLiveMetrics) against sandboxes table — always returns correct
  zeros when no capsules are running
- Fix CPU reserved formula: running + starting only; paused VMs no
  longer contribute vCPUs (RAM reservation for paused unchanged)
- Merge top cards into 3 paired Now/Peak cards with colored accent
  borders (green/blue/amber matching chart colors)
- Move Live badge from Running Capsules card to page-level header
- Add colored category dots to card and chart headers
- Charts stacked vertically, flex-1 to fill remaining page height
- vCPUs chart color changed to blue (#5a9fd4), RAM stays amber
2026-03-25 15:11:46 +06:00
fee66bda50 Add live stats page with metrics sampling and route split
- New sandbox_metrics_snapshots table sampled every 10s (60-day retention)
- Background MetricsSampler goroutine wired into control plane startup
- GET /v1/sandboxes/stats?range=5m|1h|6h|24h|30d endpoint with adaptive
  polling intervals; reserved CPU/RAM uses ceil(paused/2) formula
- StatsPanel component: 4 stat cards + 2 Chart.js line charts (straight
  lines, integer y-axis for running count, dual-axis for CPU/RAM)
- Range filter persisted in URL query param; polls update data silently
  (no blink — loading state only shown on initial mount)
- Split /dashboard/capsules into /list and /stats sub-routes with shared
  layout; capsuleRunningCount store syncs badge across routes
- CreateCapsuleDialog extracted as reusable component
2026-03-25 14:41:05 +06:00
2349f585ae Bolder, more delightful frontend across all pages
- app.css: replace flat --shadow-sm token with real shadows; add
  --shadow-card and --shadow-dialog tokens; add @keyframes status-ping
  and .animate-status-ping utility (outward ring ripple, GPU-composited
  via will-change) for live running status dots
- login: headline 5rem → 6.5rem with tighter leading/tracking; expand
  container to 460px; add sage-green dot grid texture layer beneath the
  mouse-reactive glow for industrial depth
- capsules: upgrade all running dots (header chip + row indicators +
  status bar) from opacity-fade to ring ripple; apply --shadow-dialog
  to Launch and Snapshot dialogs
- keys: apply --shadow-dialog to all three dialogs
- audit: remove duplicate @keyframes fadeUp and iconFloat (redundant
  with app.css definitions, audit's fadeUp also subtly diverged)
- sidebar: active indicator bar taller and thicker (h-5 w-[3px] → h-6
  w-1); active bg more vivid (accent/12%); label font-medium →
  font-semibold; team dialog gets --shadow-dialog
2026-03-25 12:55:23 +06:00
d4eb24be7e Added snapshot name dialogue on the UI 2026-03-25 05:30:31 +06:00
0414fbe733 Merge pull request 'Added audit logs for users' (#7) from audit-logs into dev
Reviewed-on: wrenn/sandbox#7
2026-03-24 23:21:09 +00:00
6b76abe38e Remove expandable metadata from audit log rows
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 05:19:32 +06:00
3ce8fdcb02 Add audit logs frontend page
Infinite-scroll table with hierarchical filter dropdown, expandable
metadata rows, and status-coded visual signals per event severity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 05:18:04 +06:00
1be30034bd Add audit log infrastructure and GET /v1/audit-logs endpoint
Introduces an append-only audit trail for all user and system actions:
sandbox lifecycle (create/pause/resume/destroy/auto-pause), snapshots,
team rename, API key create/revoke, member add/remove/leave/role_update,
and BYOC host add/delete/marked_down/marked_up.

- New audit_logs table (migration) with team_id, actor, resource,
  action, scope (team|admin), status (success|info|warning|error),
  metadata, and created_at
- AuditLogger (internal/audit) with named fire-and-forget methods per
  event; system actor used for background events (HostMonitor, TTL reaper)
- GET /v1/audit-logs: JWT-only, cursor pagination (max 200), multi-value
  filters for resource_type and action (comma-sep or repeated params);
  members see team-scoped events only, admins/owners see all
- AuthContext extended with APIKeyID + APIKeyName so API key requests
  record meaningful actor identity
- HostMonitor wired with AuditLogger for auto-pause and host marked_down
2026-03-25 05:15:16 +06:00
9878156798 Merge pull request 'Set up working host registration (including BYOC) with the CP' (#6) from host-registration into dev
Reviewed-on: wrenn/sandbox#6
2026-03-24 21:19:12 +00:00
e069b3e679 Add BYOC page, admin section, and is_byoc team visibility gating
- Frontend: BYOC hosts page (/dashboard/byoc) with register/delete flows,
  shimmer loading, pulsing online status, animated token reveal checkmark
- Frontend: Admin section (/admin/hosts) with platform + BYOC tabs, stat
  pills, skeleton loading, slide-in animations for new rows
- Frontend: AdminSidebar component with accent top bar and admin pill badge
- Frontend: BYOC nav item shown only when team.is_byoc is true (derived
  from teams store, not JWT); disabled for members
- Frontend: Admin shield button in Sidebar, visible only to platform admins
- Backend: is_admin in JWT claims + requireAdmin middleware (DB-validated)
- Backend: is_byoc added to teamResponse so frontend derives visibility
  from fresh team data rather than stale JWT fields
- Backend: SetBYOC admin endpoint (PUT /v1/admin/teams/{id}/byoc)
- Backend: Admin hosts list enriches BYOC entries with team_name
- Host agent: load .env file via godotenv on startup
2026-03-25 03:10:41 +06:00
9bf67aa7f7 Implement host registration, JWT refresh tokens, and multi-host scheduling
Replaces the hardcoded CP_HOST_AGENT_ADDR single-agent setup with a
DB-driven registration system supporting multiple host agents (BYOC).

Key changes:
- Host agents register via one-time token, receive a 7-day JWT + 60-day
  refresh token; heartbeat loop auto-refreshes on 401/403 and pauses all
  sandboxes if refresh fails
- HostClientPool: lazy Connect RPC client cache keyed by host ID, replacing
  the single static agent client throughout the API and service layers
- RoundRobinScheduler: picks an online host for each new sandbox via
  ListActiveHosts; extensible for future scheduling strategies
- HostMonitor (replaces Reconciler): passive heartbeat staleness check marks
  hosts unreachable and sandboxes missing after 90s; active reconciliation
  per online host restores missing-but-alive sandboxes and stops orphans
- Graceful host delete: returns 409 with affected sandbox list without
  ?force=true; force-delete destroys sandboxes then evicts pool client
- Snapshot delete broadcasts to all online hosts (templates have no host_id)
- sandbox.Manager.PauseAll: pauses all running VMs on CP connectivity loss
- New migration: host_refresh_tokens table with token rotation (issue-then-
  revoke ordering to prevent lockout on mid-rotation crash)
- New sandbox status 'missing' (reversible, unlike 'stopped') and host
  status 'unreachable'; both reflected in OpenAPI spec
- Fix: refresh token auth failure now returns 401 (was 400 via generic
  'invalid' substring match in serviceErrToHTTP)
2026-03-24 18:32:05 +06:00
f968da9768 Minor frontend enhancements 2026-03-24 17:25:00 +06:00
3932bc056e Add user names, team-scoped sandbox guard, and login robustness fixes
- Add name column to users (migration + sqlc regen); propagate through JWT
  claims, auth context, all auth/OAuth handlers, service layer, and frontend
- Sidebar and team page show name instead of email; team page splits Name/Email
  into separate columns
- Block sandbox creation in UI and API when user has no active team context
- loginTeam helper falls back to first active team when no default is set,
  fixing login for invited users with no is_default membership
- Exclude soft-deleted teams from GetDefaultTeamForUser, GetBYOCTeams queries
- Guard host creation against soft-deleted teams in service/host.go
- SwitchTeam re-fetches name from DB instead of trusting stale JWT claim
- Reset teams store on login so stale data from a previous session never persists
- Update openapi.yaml: add name to SignupRequest and AuthResponse schemas
2026-03-24 16:56:10 +06:00
aaeccd32ce Merge pull request 'Frontend consistency and improvements' (#5) from frontend-enhancement into dev
Reviewed-on: wrenn/sandbox#5
2026-03-24 10:00:27 +00:00
915d934c26 Frontend consistency pass: delight, audit, and normalization
Delight (keys page):
- Animated checkmark draw + circle pop on key reveal dialog open
- Key display area pulses accent glow on open to draw eye to "copy this"
- Copy button spring-bounces on successful copy (re-triggers on repeat)
- Empty state key icon floats (iconFloat, now global)
- Row hover uses scaleY left-accent stripe (matches capsules pattern)
- New key row flashes accent on reveal dialog dismiss (matches capsule-born)

Audit fixes (all dashboard pages):
- Page titles standardized to em dash: "Wrenn — X" across all four pages
- formatDate/timeAgo extracted to src/lib/utils/format.ts (string | undefined
  signatures); keys and snapshots now import from there instead of duplicating
- team formatDate gains undefined guard (kept local, date-only format differs)
- spin-once and iconFloat keyframes moved to app.css as globals; scoped copies
  removed from capsules and keys
- Snapshots empty state icon was referencing undefined @keyframes float; fixed
  to iconFloat

Normalization:
- Snapshots table rows: replaced ::before pseudo-element accent (opacity-only,
  single color) with DOM row-stripe element using scaleY transition, type-keyed
  color (green for snapshots, blue for images) — matches capsules pattern
- Create Key dialog: max-w-[400px] → max-w-[420px] to align with form dialogs
- Snapshots count and empty-state heading are now terminology-aware: shows
  "templates/snapshots/images" based on active filter; empty heading for all
  filter reads "No templates yet" instead of "No snapshots yet"

Not done (documented in audit, deferred):
- Sidebar nav items pointing to unimplemented routes (audit, usage, billing,
  notifications, settings) — left as-is, needs product decision
- Dialog max-widths fully normalized beyond Create Key — minor, deferred
- capsules timeAgo not imported from shared util (formatTime differs intentionally)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 15:51:11 +06:00
336080bb6d Merge pull request 'Added team related functionalities' (#4) from team-management into dev
Reviewed-on: wrenn/sandbox#4
2026-03-24 08:58:32 +00:00
90c296f5e1 Polish team page: delight micro-interactions and layout improvements
- Slug + Team ID rows collapsed into a 2-column grid for better density
- "you" badge moved inline with email instead of stacked below it
- Copy checkmark draws itself via SVG stroke-dashoffset animation
- New member row flashes accent-green on entry
- Removed member row slides out smoothly (fly transition)
- Member rows use staggered fly-in on page load
- Team name briefly highlights accent color after a successful rename
- Search result avatars get colorized initials based on email character
2026-03-24 14:56:19 +06:00
bf494f73fc Fix team name blink on navigation by lifting teams into a singleton store
Teams list was fetched on every Sidebar mount (each page navigation),
causing a flash from '…' to the real name on every tab switch. Move teams
into a module-level reactive store (teams.svelte.ts) that fetches once per
session and is shared between Sidebar and the team page.
2026-03-24 14:44:09 +06:00
71a7fdb76f Fix user search to trigger on 3 characters without requiring @
The anti-enumeration guard required @ in the email prefix, causing the
typeahead to silently return nothing until the user typed @. Replace with
a minimum 3-character length check to match the frontend trigger condition.
2026-03-24 14:41:01 +06:00
b3e8bdd171 Refine team management: name chars, danger zone, no-team state
- Allow hyphens, @, and apostrophes in team names (backend regex)
- After delete/leave, switch to next available team instead of logging
  out; if no teams remain, show a toast prompting to create one
- Disable delete/leave button when user has only one team, with
  explanatory hint to create another team first
- Show empty state on /dashboard/team when auth has no team context,
  pointing user to the sidebar to create a team
- Fetch all teams in parallel with team detail on page load to power
  the isLastTeam guard
2026-03-24 14:34:20 +06:00
1e681da738 Add team management frontend
- New /dashboard/team page with inline team name editing, slug/ID copy,
  members table with split-button (remove + make admin/member), add member
  typeahead, and danger zone (delete/leave) with confirmation dialogs
- Sidebar now fetches real teams from API, supports team switching and
  team creation via dialog
- Rename nav item Members → Team, route /dashboard/members → /dashboard/team
- New src/lib/api/team.ts with typed functions for all team endpoints
2026-03-24 14:21:53 +06:00
8e5d426638 Add team management endpoints
- Three-role model (owner/admin/member) with owner protection invariants
- Team CRUD: create, rename (admin+), soft-delete with VM cleanup (owner only)
- Member management: add by email, remove, role updates (admin+), leave
- Switch-team endpoint re-issues JWT after DB membership verification
- User email prefix search for add-member UI autocomplete
- JWT carries role as a hint; all authorization decisions verified from DB
- Team slug: immutable 12-char hex (e.g. a1b2c3-d1e2f3), reserved on soft-delete
- Migration adds slug + deleted_at to teams; backfills existing rows
2026-03-24 13:29:54 +06:00
4e26d7a292 Merge pull request 'Minor frontend enhancement' (#3) from frontend into dev
Reviewed-on: wrenn/sandbox#3
2026-03-24 06:36:17 +00:00
79eba782fb Updated design docs 2026-03-24 12:34:58 +06:00
b786a825d4 Polish dashboard frontend: spacing, copy, resilience
- Increase content padding (p-7→p-8) and table cell padding (px-4→px-5,
  py-3→py-4 for data rows) across capsules, keys, and snapshots pages
- Improve animation performance: wrenn-glow uses opacity instead of
  box-shadow (compositor-only, no paint cost)
- Add prefers-reduced-motion media query covering inline style animations
- Fix OAuth error display on login page (read ?error= param on mount)
- Harden clipboard copy with try-catch and toast fallback
- Improve empty state copy, dialog microcopy, and error messages
- Add retry button to error banners on keys page
- Replace "All systems operational" footer bar with a clean 1px divider
- Fix text truncation on long capsule/snapshot names (min-w-0 + truncate)
2026-03-24 12:33:18 +06:00
71564b202e Merge branch 'main' of git.omukk.dev:wrenn/sandbox into dev 2026-03-24 01:11:43 +06:00
5f0dbadea6 Fix snapshot and sandbox delete consistency
- Snapshot delete: make agent RPC failure a hard error so DB record is
  not removed when files cannot be deleted from disk
- Snapshot overwrite: call agent to delete old files before removing the
  DB record, preventing stale memfile.{uuid} generations from accumulating
  on disk across repeated overwrites
- Sandbox destroy: only swallow CodeNotFound from the agent (sandbox
  already gone / TTL-reaped); any other error now propagates to the caller
  instead of being silently ignored
2026-03-23 02:59:30 +06:00
36782e1b4f Add tini as PID 1, guest clock sync, and fix PATH in guest VMs
- Use tini as PID 1 in wrenn-init.sh so zombie processes are reaped
  and signals are forwarded correctly to envd
- Set standard PATH in wrenn-init.sh so child processes spawned by envd
  can find common binaries (fixes "nice: ls command not found")
- Add envdclient.Init() to POST /init on envd after every boot/resume,
  syncing the guest clock via unix.ClockSettime — critical after snapshot
  resume where the guest clock is frozen
- Run Init in a background goroutine so it doesn't block the CreateSandbox
  RPC response; a slow Init (vCPU busy with envd startup) was causing the
  RPC context to be canceled before the response reached the control plane
- Update rootfs-from-container.sh and update-debug-rootfs.sh to inject
  tini into the rootfs, checking the container image and host first,
  downloading from GitHub releases as fallback
2026-03-23 02:45:27 +06:00
97292ba0bf Added basic frontend (#1)
Reviewed-on: wrenn/sandbox#1
Co-authored-by: pptx704 <rafeed@omukk.dev>
Co-committed-by: pptx704 <rafeed@omukk.dev>
2026-03-22 19:01:38 +00:00
866f3ac012 Consolidate host agent path env vars into single AGENT_FILES_ROOTDIR
Replace AGENT_KERNEL_PATH, AGENT_IMAGES_PATH, AGENT_SANDBOXES_PATH,
AGENT_SNAPSHOTS_PATH, and AGENT_TOKEN_FILE with a single
AGENT_FILES_ROOTDIR (default /var/lib/wrenn) that derives all
subdirectory paths automatically.
2026-03-17 05:59:26 +06:00
2c66959b92 Add host registration, heartbeat, and multi-host management
Implements the full host ↔ control plane connection flow:

- Host CRUD endpoints (POST/GET/DELETE /v1/hosts) with role-based access:
  regular hosts admin-only, BYOC hosts for admins and team owners
- One-time registration token flow: admin creates host → gets token (1hr TTL
  in Redis + Postgres audit trail) → host agent registers with specs → gets
  long-lived JWT (1yr)
- Host agent registration client with automatic spec detection (arch, CPU,
  memory, disk) and token persistence to disk
- Periodic heartbeat (30s) via POST /v1/hosts/{id}/heartbeat with X-Host-Token
  auth and host ID cross-check
- Token regeneration endpoint (POST /v1/hosts/{id}/token) for retry after
  failed registration
- Tag management (add/remove/list) with team-scoped access control
- Host JWT with typ:"host" claim, cross-use prevention in both VerifyJWT and
  VerifyHostJWT
- requireHostToken middleware for host agent authentication
- DB-level race protection: RegisterHost uses AND status='pending' with
  rows-affected check; Redis GetDel for atomic token consume
- Migration for future mTLS support (cert_fingerprint, mtls_enabled columns)
- Host agent flags: --register (one-time token), --address (required ip:port)
- serviceErrToHTTP extended with "forbidden" → 403 mapping
- OpenAPI spec, .env.example, and README updated
2026-03-17 05:51:28 +06:00
e4ead076e3 Add admin users, BYOC teams, hosts schema, and Redis for host registration
Introduce three migrations: admin permissions (is_admin + permissions table),
BYOC team tracking, and multi-host support (hosts, host_tokens, host_tags).
Add Redis to dev infra and wire up client in control plane for ephemeral
host registration tokens. Add go-redis dependency.
2026-03-17 03:26:42 +06:00
1d59b50e49 Remove empty admin UI stubs
The internal/admin/ package was never imported or mounted — just
placeholder files. Removing to avoid confusion before the real
dashboard is built.
2026-03-16 05:39:43 +06:00
f38d5812d1 Extract shared service layer for sandbox, API key, and template operations
Moves business logic from API handlers into internal/service/ so that
both the REST API and the upcoming dashboard can share the same operations
without duplicating code. API handlers now delegate to the service layer
and only handle HTTP-specific concerns (request parsing, response formatting).
2026-03-16 05:39:30 +06:00
931b7d54b3 Add GitHub OAuth login with provider registry
Implement OAuth 2.0 login via GitHub as an alternative to email/password.
Uses a provider registry pattern (internal/auth/oauth/) so adding Google
or other providers later requires only a new Provider implementation.

Flow: GET /v1/auth/oauth/github redirects to GitHub, callback exchanges
the code for a user profile, upserts the user + team atomically, and
redirects to the frontend with a JWT token.

Key changes:
- Migration: make password_hash nullable, add oauth_providers table
- Provider registry with GitHubProvider (profile + email fallback)
- CSRF state cookie with HMAC-SHA256 validation
- Race-safe registration (23505 collision retries as login)
- Startup validation: CP_PUBLIC_URL required when OAuth is configured

Not fully tested — needs integration tests with a real GitHub OAuth app
and end-to-end testing with the frontend callback page.
2026-03-15 06:31:58 +06:00
477d4f8cf6 Add auto-pause TTL and ping endpoint for sandbox inactivity management
Replace the existing auto-destroy TTL behavior with auto-pause: when a
sandbox exceeds its timeout_sec of inactivity, the TTL reaper now pauses
it (snapshot + teardown) instead of destroying it, preserving the ability
to resume later.

Key changes:
- TTL reaper calls Pause instead of Destroy, with fallback to Destroy if
  pause fails (e.g. Firecracker process already gone)
- New PingSandbox RPC resets the in-memory LastActiveAt timer
- New POST /v1/sandboxes/{id}/ping REST endpoint resets both agent memory
  and DB last_active_at
- ListSandboxes RPC now includes auto_paused_sandbox_ids so the reconciler
  can distinguish auto-paused sandboxes from crashed ones in a single call
- Reconciler polls every 5s (was 30s) and marks auto-paused as "paused"
  vs orphaned as "stopped"
- Resume RPC accepts timeout_sec from DB so TTL survives pause/resume cycles
- Reaper checks every 2s (was 10s) and uses a detached context to avoid
  incomplete pauses on app shutdown
- Default timeout_sec changed from 300 to 0 (no auto-pause unless requested)
2026-03-15 05:15:18 +06:00
88246fac2b Fix sandbox lifecycle cleanup and dmsetup remove reliability
- Add retry with backoff to dmsetupRemove for transient "device busy"
  errors caused by kernel not releasing the device immediately after
  Firecracker exits. Only retries on "Device or resource busy"; other
  errors (not found, permission denied) return immediately.

- Thread context.Context through RemoveSnapshot/RestoreSnapshot so
  retries respect cancellation. Use context.Background() in all error
  cleanup paths to prevent cancelled contexts from skipping cleanup
  and leaking dm devices on the host.

- Resume vCPUs on pause failure: if snapshot creation or memfile
  processing fails after freezing the VM, unfreeze vCPUs so the
  sandbox stays usable instead of becoming a frozen zombie.

- Fix resource leaks in Pause when CoW rename or metadata write fails:
  properly clean up network, slot, loop device, and remove from boxes
  map instead of leaving a dead sandbox with leaked host resources.

- Fix Resume WaitUntilReady failure: roll back CoW file to the snapshot
  directory instead of deleting it, preserving the paused state so the
  user can retry.

- Skip m.loops.Release when RemoveSnapshot fails during pause since
  the stale dm device still references the origin loop device.

- Fix incorrect VCPUs placeholder in Resume VMConfig that used memory
  size instead of a sensible default.
2026-03-14 06:42:34 +06:00
1846168736 Fix device-mapper "Device or resource busy" error on sandbox resume
Pause was logging RemoveSnapshot failures as warnings and continuing,
which left stale dm devices behind. Resume then failed trying to create
a device with the same name.

- Make RemoveSnapshot failure a hard error in Pause (clean up remaining
  resources and return error instead of silently proceeding)
- Add defensive stale device cleanup in RestoreSnapshot before creating
  the new dm device
2026-03-14 03:57:14 +06:00
c92cc29b88 Add authentication, authorization, and team-scoped access control
Implement email/password auth with JWT sessions and API key auth for
sandbox lifecycle. Users get a default team on signup; sandboxes,
snapshots, and API keys are scoped to teams.

- Add user, team, users_teams, and team_api_keys tables (goose migrations)
- Add JWT middleware (Bearer token) for user management endpoints
- Add API key middleware (X-API-Key header, SHA-256 hashed) for sandbox ops
- Add signup/login handlers with transactional user+team creation
- Add API key CRUD endpoints (create/list/delete)
- Replace owner_id with team_id on sandboxes and templates
- Update all handlers to use team-scoped queries
- Add godotenv for .env file loading
- Update OpenAPI spec and test UI with auth flows
2026-03-14 03:57:06 +06:00
712b77b01c Add script to create rootfs from Docker container 2026-03-13 09:41:58 +06:00
80a99eec87 Add diff snapshots for re-pause to avoid UFFD fault-in storm
Use Firecracker's Diff snapshot type when re-pausing a previously
resumed sandbox, capturing only dirty pages instead of a full memory
dump. Chains up to 10 incremental generations before collapsing back
to a Full snapshot. Multi-generation diff files (memfile.{buildID})
are supported alongside the legacy single-file format in resume,
template creation, and snapshot existence checks.
2026-03-13 09:41:58 +06:00
a0d635ae5e Fix path traversal in template/snapshot names and network cleanup leaks
Add SafeName validator (allowlist regex) to reject directory traversal
in user-supplied template and snapshot names. Validated at both API
handlers (400 response) and sandbox manager (defense in depth).

Refactor CreateNetwork with rollback slice so partially created
resources (namespace, veth, routes, iptables rules) are cleaned up
on any error. Refactor RemoveNetwork to collect and return errors
instead of silently ignoring them.
2026-03-13 08:40:36 +06:00
63e9132d38 Add device-mapper snapshots, test UI, fix pause ordering and lint errors
- Replace reflink rootfs copy with device-mapper snapshots (shared
  read-only loop device per base template, per-sandbox sparse CoW file)
- Add devicemapper package with create/restore/remove/flatten operations
  and refcounted LoopRegistry for base image loop devices
- Fix pause ordering: destroy VM before removing dm-snapshot to avoid
  "device busy" error (FC must release the dm device first)
- Add test UI at GET /test for sandbox lifecycle management (create,
  pause, resume, destroy, exec, snapshot create/list/delete)
- Fix DirSize to report actual disk usage (stat.Blocks * 512) instead
  of apparent size, so sparse CoW files report correctly
- Add timing logs to pause flow for performance diagnostics
- Fix all lint errors across api, network, vm, uffd, and sandbox packages
- Remove obsolete internal/filesystem package (replaced by devicemapper)
- Update CLAUDE.md with device-mapper architecture documentation
2026-03-13 08:25:40 +06:00
778894b488 Made license related changes 2026-03-13 05:42:10 +06:00
a1bd439c75 Add sandbox snapshot and restore with UFFD lazy memory loading
Implement full snapshot lifecycle: pause (snapshot + free resources),
resume (UFFD-based lazy restore), and named snapshot templates that
can spawn new sandboxes from frozen VM state.

Key changes:
- Snapshot header system with generational diff mapping (inspired by e2b)
- UFFD server for lazy page fault handling during snapshot restore
- Stable rootfs symlink path (/tmp/fc-vm/) for snapshot compatibility
- Templates DB table and CRUD API endpoints (POST/GET/DELETE /v1/snapshots)
- CreateSnapshot/DeleteSnapshot RPCs in hostagent proto
- Reconciler excludes paused sandboxes (expected absent from host agent)
- Snapshot templates lock vcpus/memory to baked-in values
- Proper cleanup of uffd sockets and pause snapshot files on destroy
2026-03-12 09:19:37 +06:00
9b94df7f56 Rewrite CLAUDE.md and README.md
CLAUDE.md: replace bloated 850-line version with focused 230-line
guide. Fix inaccuracies (module path, build dir, Connect RPC vs gRPC,
buf vs protoc). Add detailed architecture with request flows, code
generation workflow, rootfs update process, and two-module gotchas.

README.md: add core deployment instructions (prerequisites, build,
host setup, configuration, running, rootfs workflow).
2026-03-11 06:37:11 +06:00
0c245e9e1c Fix guest VM outbound networking and DNS resolution
Add resolv.conf to wrenn-init so guests can resolve DNS, and fix the
host MASQUERADE rule to match vpeerIP (the actual source after namespace
SNAT) instead of hostIP.
2026-03-11 06:02:31 +06:00
b4d8edb65b Add streaming exec and file transfer endpoints
Add WebSocket-based streaming exec endpoint and streaming file
upload/download endpoints to the control plane API. Includes new
host agent RPC methods (ExecStream, StreamWriteFile, StreamReadFile),
envd client streaming support, and OpenAPI spec updates.
2026-03-11 05:42:42 +06:00
ec3360d9ad Add minimal control plane with REST API, database, and reconciler
- REST API (chi router): sandbox CRUD, exec, pause/resume, file write/read
- PostgreSQL persistence via pgx/v5 + sqlc (sandboxes table with goose migration)
- Connect RPC client to host agent for all VM operations
- Reconciler syncs host agent state with DB every 30s (detects TTL-reaped sandboxes)
- OpenAPI 3.1 spec served at /openapi.yaml, Swagger UI at /docs
- Added WriteFile/ReadFile RPCs to hostagent proto and implementations
- File upload via multipart form, download via JSON body POST
- sandbox_id propagated from control plane to host agent on create
2026-03-10 16:50:12 +06:00
d7b25b0891 updated license structure 2026-03-10 04:32:29 +06:00
34c89e814d Added basic license information 2026-03-10 04:28:51 +06:00
6f0c365d44 Add host agent RPC server with sandbox lifecycle management
Implement the host agent as a Connect RPC server that orchestrates
sandbox creation, destruction, pause/resume, and command execution.
Includes sandbox manager with TTL-based reaper, network slot allocator,
rootfs cloning, hostagent proto definition with generated stubs, and
test/debug scripts. Fix Firecracker process lifetime bug where VM was
tied to HTTP request context instead of background context.
2026-03-10 03:54:53 +06:00
c31ce90306 Centralize envd proto source of truth to proto/envd/
Remove duplicate proto files from envd/spec/ and update envd's
buf.gen.yaml to generate stubs from the canonical proto/envd/ location.
Both modules now generate their own Connect RPC stubs from the same
source protos.
2026-03-10 02:49:31 +06:00
7753938044 Add host agent with VM lifecycle, TAP networking, and envd client
Implements Phase 1: boot a Firecracker microVM, execute a command inside
it via envd, and get the output back. Uses raw Firecracker HTTP API via
Unix socket (not the Go SDK) for full control over the VM lifecycle.

- internal/vm: VM manager with create/pause/resume/destroy, Firecracker
  HTTP client, process launcher with unshare + ip netns exec isolation
- internal/network: per-sandbox network namespace with veth pair, TAP
  device, NAT rules, and IP forwarding
- internal/envdclient: Connect RPC client for envd process/filesystem
  services with health check retry
- cmd/host-agent: demo binary that boots a VM, runs "echo hello", prints
  output, and cleans up
- proto/envd: canonical proto files with buf + protoc-gen-connect-go
  code generation
- images/wrenn-init.sh: minimal PID 1 init script for guest VMs
- CLAUDE.md: updated architecture to reflect TAP networking (not vsock)
  and Firecracker HTTP API (not Go SDK)
2026-03-10 00:06:47 +06:00
a3898d68fb Port envd from e2b with internalized shared packages and Connect RPC
- Copy envd source from e2b-dev/infra, internalize shared dependencies
  into envd/internal/shared/ (keys, filesystem, id, smap, utils)
- Switch from gRPC to Connect RPC for all envd services
- Update module paths to git.omukk.dev/wrenn/{sandbox,sandbox/envd}
- Add proto specs (process, filesystem) with buf-based code generation
- Implement full envd: process exec, filesystem ops, port forwarding,
  cgroup management, MMDS integration, and HTTP API
- Update main module dependencies (firecracker SDK, pgx, goose, etc.)
- Remove placeholder .gitkeep files replaced by real implementations
2026-03-09 21:03:19 +06:00
76 changed files with 2930 additions and 5468 deletions

View File

@ -16,7 +16,7 @@ WRENN_HOST_LISTEN_ADDR=:50051
WRENN_HOST_INTERFACE=eth0 WRENN_HOST_INTERFACE=eth0
WRENN_CP_URL=http://localhost:9725 WRENN_CP_URL=http://localhost:9725
WRENN_DEFAULT_ROOTFS_SIZE=5Gi WRENN_DEFAULT_ROOTFS_SIZE=5Gi
WRENN_FIRECRACKER_BIN=/usr/local/bin/firecracker WRENN_CH_BIN=/usr/local/bin/cloud-hypervisor
# Auth # Auth
JWT_SECRET= JWT_SECRET=

1
.gitignore vendored
View File

@ -55,3 +55,4 @@ internal/dashboard/static/*
.dual-graph/ .dual-graph/
# Added by code-review-graph # Added by code-review-graph
.code-review-graph/ .code-review-graph/
.mcp.json

View File

@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview ## Project Overview
Wrenn Sandbox is a microVM-based code execution platform. Users create isolated sandboxes (Firecracker microVMs), run code inside them, and get output back via SDKs. Think E2B but with persistent sandboxes, pool-based pricing, and a single-binary deployment story. Wrenn Sandbox is a microVM-based code execution platform. Users create isolated sandboxes (Cloud Hypervisor microVMs), run code inside them, and get output back via SDKs. Think E2B but with persistent sandboxes, pool-based pricing, and a single-binary deployment story.
## Build & Development Commands ## Build & Development Commands
@ -23,11 +23,11 @@ make dev-down # Stop dev infra
make dev-cp # Control plane with hot reload (if air installed) make dev-cp # Control plane with hot reload (if air installed)
make dev-frontend # Vite dev server with HMR (port 5173) make dev-frontend # Vite dev server with HMR (port 5173)
make dev-agent # Host agent (sudo required) make dev-agent # Host agent (sudo required)
make dev-envd # envd in debug mode (--isnotfc, port 49983) make dev-envd # envd in debug mode (port 49983)
make check # fmt + vet + lint + test (CI order) make check # fmt + vet + lint + test (CI order)
make test # Unit tests: go test -race -v ./internal/... make test # Unit tests: go test -race -v ./internal/...
make test-integration # Integration tests (require host agent + Firecracker) make test-integration # Integration tests (require host agent + Cloud Hypervisor)
make fmt # gofmt make fmt # gofmt
make vet # go vet make vet # go vet
make lint # golangci-lint make lint # golangci-lint
@ -92,7 +92,7 @@ Startup (`cmd/host-agent/main.go`) wires: root/capabilities check → enable IP
- **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return. - **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return.
- **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes. - **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes.
- **VM Manager** (`internal/vm/manager.go`, `fc.go`, `config.go`): manages Firecracker processes. Uses raw HTTP API over Unix socket (`/tmp/fc-{sandboxID}.sock`), not the firecracker-go-sdk Machine type. Launches Firecracker via `unshare -m` + `ip netns exec`. Configures VM via PUT to `/boot-source`, `/drives/rootfs`, `/network-interfaces/eth0`, `/machine-config`, then starts with PUT `/actions`. - **VM Manager** (`internal/vm/manager.go`, `ch.go`, `config.go`): manages Cloud Hypervisor processes. Uses raw HTTP API over Unix socket (`/tmp/ch-{sandboxID}.sock`). Launches Cloud Hypervisor via `unshare -m` + `ip netns exec` with `--api-socket path=...`. Configures and boots VM via `PUT /vm.create` + `PUT /vm.boot`. Snapshot restore uses `--restore source_url=file://...`.
- **Network** (`internal/network/setup.go`, `allocator.go`): per-sandbox network namespace with veth pair + TAP device. See Networking section below. - **Network** (`internal/network/setup.go`, `allocator.go`): per-sandbox network namespace with veth pair + TAP device. See Networking section below.
- **Device Mapper** (`internal/devicemapper/devicemapper.go`): CoW rootfs via device-mapper snapshots. Shared read-only loop devices per base template (refcounted `LoopRegistry`), per-sandbox sparse CoW files, dm-snapshot create/restore/remove/flatten operations. - **Device Mapper** (`internal/devicemapper/devicemapper.go`): CoW rootfs via device-mapper snapshots. Shared read-only loop devices per base template (refcounted `LoopRegistry`), per-sandbox sparse CoW files, dm-snapshot create/restore/remove/flatten operations.
- **envd Client** (`internal/envdclient/client.go`, `health.go`): dual interface to the guest agent. Connect RPC for streaming process exec (`process.Start()` bidirectional stream). Plain HTTP for file operations (POST/GET `/files?path=...&username=root`). Health check polls `GET /health` every 100ms until ready (30s timeout). - **envd Client** (`internal/envdclient/client.go`, `health.go`): dual interface to the guest agent. Connect RPC for streaming process exec (`process.Start()` bidirectional stream). Plain HTTP for file operations (POST/GET `/files?path=...&username=root`). Health check polls `GET /health` every 100ms until ready (30s timeout).
@ -109,14 +109,14 @@ Runs as PID 1 inside the microVM via `wrenn-init.sh` (mounts procfs/sysfs/dev, s
- **HTTP endpoints**: GET `/health`, GET `/metrics`, POST `/init`, POST `/snapshot/prepare`, GET/POST `/files` - **HTTP endpoints**: GET `/health`, GET `/metrics`, POST `/init`, POST `/snapshot/prepare`, GET/POST `/files`
- **Proto codegen**: `connectrpc-build` compiles `proto/envd/*.proto` at `cargo build` time via `build.rs` — no committed stubs - **Proto codegen**: `connectrpc-build` compiles `proto/envd/*.proto` at `cargo build` time via `build.rs` — no committed stubs
- **Build**: `make build-envd` → static musl binary in `builds/envd` - **Build**: `make build-envd` → static musl binary in `builds/envd`
- **Dev**: `make dev-envd``cargo run -- --isnotfc --port 49983` - **Dev**: `make dev-envd``cargo run -- --port 49983`
### Dashboard (Frontend) ### Dashboard (Frontend)
**Directory:** `frontend/` — standalone SvelteKit app (Svelte 5, runes mode) **Directory:** `frontend/` — standalone SvelteKit app (Svelte 5, runes mode)
- **Stack**: SvelteKit + `adapter-static` + Tailwind CSS v4 + Bits UI (headless accessible components) - **Stack**: SvelteKit + `adapter-static` + Tailwind CSS v4 + Bits UI (headless accessible components)
- **Package manager**: pnpm - **Package manager**: Bun
- **Routing**: SvelteKit file-based routing under `frontend/src/routes/` - **Routing**: SvelteKit file-based routing under `frontend/src/routes/`
- **Routing layout**: `/login` and `/signup` at root, authenticated pages under `/dashboard/*` (e.g. `/dashboard/capsules`, `/dashboard/keys`) - **Routing layout**: `/login` and `/signup` at root, authenticated pages under `/dashboard/*` (e.g. `/dashboard/capsules`, `/dashboard/keys`)
- **Build output**: `frontend/build/` — static files served by Caddy - **Build output**: `frontend/build/` — static files served by Caddy
@ -164,7 +164,7 @@ HIBERNATED → RUNNING (cold snapshot resume, slower)
**Sandbox creation** (`POST /v1/capsules`): **Sandbox creation** (`POST /v1/capsules`):
1. API handler generates sandbox ID, inserts into DB as "pending" 1. API handler generates sandbox ID, inserts into DB as "pending"
2. RPC `CreateSandbox` → host agent → `sandbox.Manager.Create()` 2. RPC `CreateSandbox` → host agent → `sandbox.Manager.Create()`
3. Manager: resolve base rootfs → acquire shared loop device → create dm-snapshot (sparse CoW file) → allocate network slot → `CreateNetwork()` (netns + veth + tap + NAT) → `vm.Create()` (start Firecracker with `/dev/mapper/wrenn-{id}`, configure via HTTP API, boot) → `envdclient.WaitUntilReady()` (poll /health) → store in-memory state 3. Manager: resolve base rootfs → acquire shared loop device → create dm-snapshot (sparse CoW file) → allocate network slot → `CreateNetwork()` (netns + veth + tap + NAT) → `vm.Create()` (start Cloud Hypervisor with `/dev/mapper/wrenn-{id}`, configure via `PUT /vm.create` + `PUT /vm.boot`) → `envdclient.WaitUntilReady()` (poll /health) → store in-memory state
4. API handler updates DB to "running" with host_ip 4. API handler updates DB to "running" with host_ip
**Command execution** (`POST /v1/capsules/{id}/exec`): **Command execution** (`POST /v1/capsules/{id}/exec`):
@ -210,9 +210,9 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
- **Connect RPC** (not gRPC) for all RPC communication between components - **Connect RPC** (not gRPC) for all RPC communication between components
- **Buf + protoc-gen-connect-go** for Go code generation; **connectrpc-build** for Rust code generation in envd - **Buf + protoc-gen-connect-go** for Go code generation; **connectrpc-build** for Rust code generation in envd
- **Raw Firecracker HTTP API** via Unix socket (not firecracker-go-sdk Machine type) - **Raw Cloud Hypervisor HTTP API** via Unix socket (`PUT /vm.create` + `PUT /vm.boot`)
- **TAP networking** (not vsock) for host-to-envd communication - **TAP networking** (not vsock) for host-to-envd communication
- **Device-mapper snapshots** for rootfs CoW — shared read-only loop device per base template, per-sandbox sparse CoW file, Firecracker gets `/dev/mapper/wrenn-{id}` - **Device-mapper snapshots** for rootfs CoW — shared read-only loop device per base template, per-sandbox sparse CoW file, Cloud Hypervisor gets `/dev/mapper/wrenn-{id}`
- **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down) - **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down)
- **Dashboard**: SvelteKit (Svelte 5, adapter-static) + Tailwind CSS v4 + Bits UI. Built to static files in `frontend/build/`, served by Caddy (not embedded in the Go binary) - **Dashboard**: SvelteKit (Svelte 5, adapter-static) + Tailwind CSS v4 + Bits UI. Built to static files in `frontend/build/`, served by Caddy (not embedded in the Go binary)
- **Lago** for billing (external service, not in this codebase) - **Lago** for billing (external service, not in this codebase)
@ -237,19 +237,19 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
- Kernel: `/var/lib/wrenn/kernels/vmlinux` - Kernel: `/var/lib/wrenn/kernels/vmlinux`
- Base rootfs images: `/var/lib/wrenn/images/{template}.ext4` - Base rootfs images: `/var/lib/wrenn/images/{template}.ext4`
- Sandbox clones: `/var/lib/wrenn/sandboxes/` - Sandbox clones: `/var/lib/wrenn/sandboxes/`
- Firecracker: `/usr/local/bin/firecracker` (e2b's fork of firecracker) - Cloud Hypervisor: `/usr/local/bin/cloud-hypervisor`
## Design Context ## Design Context
### Users ### Users
Developers across the full spectrum — solo engineers building side projects, startup teams integrating sandboxed execution into products, and platform/infra engineers at larger organizations running production workloads on Firecracker microVMs. They arrive with context: they know what a process is, what a rootfs is, what a TTY means. The interface must feel at home for all three: approachable enough not to intimidate a hacker, precise enough to earn the trust of a production ops team. Never condescend, never oversimplify. Trust the user to understand what they're looking at. Developers across the full spectrum — solo engineers building side projects, startup teams integrating sandboxed execution into products, and platform/infra engineers at larger organizations running production workloads on Cloud Hypervisor microVMs. They arrive with context: they know what a process is, what a rootfs is, what a TTY means. The interface must feel at home for all three: approachable enough not to intimidate a hacker, precise enough to earn the trust of a production ops team. Never condescend, never oversimplify. Trust the user to understand what they're looking at.
**Primary job to be done:** Understand what's running, act on it confidently, and get back to code. **Primary job to be done:** Understand what's running, act on it confidently, and get back to code.
### Brand Personality ### Brand Personality
**Precise. Warm. Uncompromising.** **Precise. Warm. Uncompromising.**
Wrenn is an engineer's favorite tool — built with visible care, not assembled from defaults. It runs real infrastructure (Firecracker microVMs), so the UI should reflect that seriousness without becoming cold or corporate. The warmth comes from the typography and color palette; the precision comes from hierarchy, density, and data fidelity. Wrenn is an engineer's favorite tool — built with visible care, not assembled from defaults. It runs real infrastructure (Cloud Hypervisor microVMs), so the UI should reflect that seriousness without becoming cold or corporate. The warmth comes from the typography and color palette; the precision comes from hierarchy, density, and data fidelity.
Emotional goal: **in control.** Users leave a session with full confidence in what's running, what happened, and what comes next. Nothing is hidden, nothing is ambiguous. Emotional goal: **in control.** Users leave a session with full confidence in what's running, what happened, and what comes next. Nothing is hidden, nothing is ambiguous.

View File

@ -16,7 +16,7 @@ LDFLAGS := -s -w
build: build-cp build-agent build-envd build: build-cp build-agent build-envd
build-frontend: build-frontend:
cd frontend && pnpm install --frozen-lockfile && pnpm build cd frontend && bun install --frozen-lockfile && bun run build
build-cp: build-cp:
go build -v -ldflags="$(LDFLAGS) -X main.version=$(VERSION_CP) -X main.commit=$(COMMIT)" -o $(BIN_DIR)/wrenn-cp ./cmd/control-plane go build -v -ldflags="$(LDFLAGS) -X main.version=$(VERSION_CP) -X main.commit=$(COMMIT)" -o $(BIN_DIR)/wrenn-cp ./cmd/control-plane
@ -59,10 +59,10 @@ dev-agent:
sudo go run ./cmd/host-agent sudo go run ./cmd/host-agent
dev-frontend: dev-frontend:
cd frontend && pnpm dev --port 5173 --host 0.0.0.0 cd frontend && bun run dev --port 5173 --host 0.0.0.0
dev-envd: dev-envd:
cd envd-rs && cargo run -- --isnotfc --port 49983 cd envd-rs && cargo run -- --port 49983
# ═══════════════════════════════════════════════════ # ═══════════════════════════════════════════════════
# Database (goose) # Database (goose)
@ -181,7 +181,7 @@ help:
@echo " make dev-cp Control plane (hot reload if air installed)" @echo " make dev-cp Control plane (hot reload if air installed)"
@echo " make dev-frontend Vite dev server with HMR (port 5173)" @echo " make dev-frontend Vite dev server with HMR (port 5173)"
@echo " make dev-agent Host agent (sudo required)" @echo " make dev-agent Host agent (sudo required)"
@echo " make dev-envd envd in debug mode (--isnotfc, port 49983)" @echo " make dev-envd envd in debug mode (port 49983)"
@echo "" @echo ""
@echo " make build Build all binaries → builds/" @echo " make build Build all binaries → builds/"
@echo " make build-frontend Build SvelteKit dashboard → frontend/build/" @echo " make build-frontend Build SvelteKit dashboard → frontend/build/"

View File

@ -5,11 +5,11 @@ Secure infrastructure for AI
## Prerequisites ## Prerequisites
- Linux host with `/dev/kvm` access (bare metal or nested virt) - Linux host with `/dev/kvm` access (bare metal or nested virt)
- Firecracker binary at `/usr/local/bin/firecracker` - Cloud Hypervisor binary at `/usr/local/bin/cloud-hypervisor`
- PostgreSQL - PostgreSQL
- Go 1.25+ - Go 1.25+
- Rust 1.88+ with `x86_64-unknown-linux-musl` target (`rustup target add x86_64-unknown-linux-musl`) - Rust 1.88+ with `x86_64-unknown-linux-musl` target (`rustup target add x86_64-unknown-linux-musl`)
- pnpm (for frontend) - Bun (for frontend)
- Docker (for dev infra and rootfs builds) - Docker (for dev infra and rootfs builds)
## Build ## Build

View File

@ -1 +1 @@
0.1.3 0.2.0

View File

@ -1 +1 @@
0.1.6 0.2.0

View File

@ -126,27 +126,32 @@ func main() {
} }
slog.Info("resolved kernel", "version", kernelVersion, "path", kernelPath) slog.Info("resolved kernel", "version", kernelVersion, "path", kernelPath)
// Detect firecracker version. // Detect cloud-hypervisor version.
fcBin := envOrDefault("WRENN_FIRECRACKER_BIN", "/usr/local/bin/firecracker") chBin := envOrDefault("WRENN_CH_BIN", "/usr/local/bin/cloud-hypervisor")
fcVersion, err := sandbox.DetectFirecrackerVersion(fcBin) chVersion, err := sandbox.DetectCHVersion(chBin)
if err != nil { if err != nil {
slog.Error("failed to detect firecracker version", "error", err) slog.Error("failed to detect cloud-hypervisor version", "error", err)
os.Exit(1) os.Exit(1)
} }
slog.Info("resolved firecracker", "version", fcVersion, "path", fcBin) slog.Info("resolved cloud-hypervisor", "version", chVersion, "path", chBin)
cfg := sandbox.Config{ cfg := sandbox.Config{
WrennDir: rootDir, WrennDir: rootDir,
DefaultRootfsSizeMB: defaultRootfsSizeMB, DefaultRootfsSizeMB: defaultRootfsSizeMB,
KernelPath: kernelPath, KernelPath: kernelPath,
KernelVersion: kernelVersion, KernelVersion: kernelVersion,
FirecrackerBin: fcBin, VMMBin: chBin,
FirecrackerVersion: fcVersion, VMMVersion: chVersion,
AgentVersion: version, AgentVersion: version,
} }
mgr := sandbox.New(cfg) mgr := sandbox.New(cfg)
// Set up lifecycle event callback sender so autonomous events
// (auto-pause, auto-destroy) are pushed to the CP proactively.
cb := hostagent.NewCallbackSender(cpURL, credsFile, creds.HostID)
mgr.SetEventSender(hostagent.NewEventSender(cb))
mgr.StartTTLReaper(ctx) mgr.StartTTLReaper(ctx)
// httpServer is declared here so the shutdown func can reference it. // httpServer is declared here so the shutdown func can reference it.
@ -226,8 +231,9 @@ func main() {
func() { func() {
doShutdown("host deleted from CP") doShutdown("host deleted from CP")
}, },
// onCredsRefreshed: hot-swap the TLS certificate after a JWT refresh. // onCredsRefreshed: hot-swap the TLS certificate and update callback JWT.
func(tf *hostagent.TokenFile) { func(tf *hostagent.TokenFile) {
cb.UpdateJWT(tf.JWT)
if tf.CertPEM == "" || tf.KeyPEM == "" { if tf.CertPEM == "" || tf.KeyPEM == "" {
return return
} }
@ -239,12 +245,16 @@ func main() {
}, },
) )
// Graceful shutdown on SIGINT/SIGTERM. // Graceful shutdown on SIGINT/SIGTERM. A second signal force-exits
// so the operator can always kill the process if shutdown hangs.
sigCh := make(chan os.Signal, 1) sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM) signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
go func() { go func() {
sig := <-sigCh sig := <-sigCh
doShutdown("signal: " + sig.String()) go doShutdown("signal: " + sig.String())
sig = <-sigCh
slog.Error("received second signal, force exiting", "signal", sig.String())
os.Exit(1)
}() }()
slog.Info("host agent starting", "addr", listenAddr, "host_id", creds.HostID, "version", version, "commit", commit) slog.Info("host agent starting", "addr", listenAddr, "host_id", creds.HostID, "version", version, "commit", commit)
@ -286,7 +296,7 @@ func checkPrivileges() error {
name string name string
}{ }{
{1, "CAP_DAC_OVERRIDE"}, // /dev/loop*, /dev/mapper/*, /dev/net/tun {1, "CAP_DAC_OVERRIDE"}, // /dev/loop*, /dev/mapper/*, /dev/net/tun
{5, "CAP_KILL"}, // SIGTERM/SIGKILL to Firecracker processes {5, "CAP_KILL"}, // SIGTERM/SIGKILL to cloud-hypervisor processes
{12, "CAP_NET_ADMIN"}, // netlink, iptables, routing, TAP/veth {12, "CAP_NET_ADMIN"}, // netlink, iptables, routing, TAP/veth
{13, "CAP_NET_RAW"}, // raw sockets (iptables) {13, "CAP_NET_RAW"}, // raw sockets (iptables)
{19, "CAP_SYS_PTRACE"}, // reading /proc/self/ns/net (netns.Get) {19, "CAP_SYS_PTRACE"}, // reading /proc/self/ns/net (netns.Get)

View File

@ -72,7 +72,7 @@ ORDER BY created_at DESC;
UPDATE sandboxes UPDATE sandboxes
SET status = 'missing', SET status = 'missing',
last_updated = NOW() last_updated = NOW()
WHERE host_id = $1 AND status IN ('running', 'starting', 'pending'); WHERE host_id = $1 AND status IN ('running', 'starting', 'pending', 'pausing', 'resuming', 'stopping');
-- name: UpdateSandboxMetadata :exec -- name: UpdateSandboxMetadata :exec
UPDATE sandboxes UPDATE sandboxes
@ -80,6 +80,30 @@ SET metadata = $2,
last_updated = NOW() last_updated = NOW()
WHERE id = $1; WHERE id = $1;
-- name: UpdateSandboxRunningIf :one
-- Conditionally transition a sandbox to running only if the current status
-- matches the expected value. Prevents races where a user destroys a sandbox
-- while the create/resume goroutine is still in-flight.
UPDATE sandboxes
SET status = 'running',
host_ip = $3,
guest_ip = $4,
started_at = $5,
last_active_at = $5,
last_updated = NOW()
WHERE id = $1 AND status = $2
RETURNING *;
-- name: UpdateSandboxStatusIf :one
-- Atomically update status only when the current status matches the expected value.
-- Prevents background goroutines from overwriting a status that has since changed
-- (e.g. user destroyed a sandbox while Create was in-flight).
UPDATE sandboxes
SET status = $3,
last_updated = NOW()
WHERE id = $1 AND status = $2
RETURNING *;
-- name: BulkRestoreRunning :exec -- name: BulkRestoreRunning :exec
-- Called by the reconciler when a host comes back online and its sandboxes are -- Called by the reconciler when a host comes back online and its sandboxes are
-- confirmed alive. Restores only sandboxes that are in 'missing' state. -- confirmed alive. Restores only sandboxes that are in 'missing' state.

426
envd-rs/Cargo.lock generated
View File

@ -241,12 +241,6 @@ dependencies = [
"serde", "serde",
] ]
[[package]]
name = "bumpalo"
version = "3.20.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d20789868f4b01b2f2caec9f5c4e0213b41e3e5702a50157d699ae31ced2fcb"
[[package]] [[package]]
name = "bytes" name = "bytes"
version = "1.11.1" version = "1.11.1"
@ -486,17 +480,6 @@ dependencies = [
"subtle", "subtle",
] ]
[[package]]
name = "displaydoc"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "97369cbbc041bc366949bc74d34658d6cda5621039731c6310521892a3a20ae0"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]] [[package]]
name = "either" name = "either"
version = "1.15.0" version = "1.15.0"
@ -514,7 +497,7 @@ dependencies = [
[[package]] [[package]]
name = "envd" name = "envd"
version = "0.2.1" version = "0.3.0"
dependencies = [ dependencies = [
"async-stream", "async-stream",
"axum", "axum",
@ -537,7 +520,6 @@ dependencies = [
"mime_guess", "mime_guess",
"nix", "nix",
"notify", "notify",
"reqwest",
"serde", "serde",
"serde_json", "serde_json",
"sha2", "sha2",
@ -889,7 +871,6 @@ dependencies = [
"pin-project-lite", "pin-project-lite",
"smallvec", "smallvec",
"tokio", "tokio",
"want",
] ]
[[package]] [[package]]
@ -898,103 +879,13 @@ version = "0.1.20"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0" checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0"
dependencies = [ dependencies = [
"base64",
"bytes", "bytes",
"futures-channel",
"futures-util",
"http", "http",
"http-body", "http-body",
"hyper", "hyper",
"ipnet",
"libc",
"percent-encoding",
"pin-project-lite", "pin-project-lite",
"socket2",
"tokio", "tokio",
"tower-service", "tower-service",
"tracing",
]
[[package]]
name = "icu_collections"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2984d1cd16c883d7935b9e07e44071dca8d917fd52ecc02c04d5fa0b5a3f191c"
dependencies = [
"displaydoc",
"potential_utf",
"utf8_iter",
"yoke",
"zerofrom",
"zerovec",
]
[[package]]
name = "icu_locale_core"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92219b62b3e2b4d88ac5119f8904c10f8f61bf7e95b640d25ba3075e6cac2c29"
dependencies = [
"displaydoc",
"litemap",
"tinystr",
"writeable",
"zerovec",
]
[[package]]
name = "icu_normalizer"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c56e5ee99d6e3d33bd91c5d85458b6005a22140021cc324cea84dd0e72cff3b4"
dependencies = [
"icu_collections",
"icu_normalizer_data",
"icu_properties",
"icu_provider",
"smallvec",
"zerovec",
]
[[package]]
name = "icu_normalizer_data"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da3be0ae77ea334f4da67c12f149704f19f81d1adf7c51cf482943e84a2bad38"
[[package]]
name = "icu_properties"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bee3b67d0ea5c2cca5003417989af8996f8604e34fb9ddf96208a033901e70de"
dependencies = [
"icu_collections",
"icu_locale_core",
"icu_properties_data",
"icu_provider",
"zerotrie",
"zerovec",
]
[[package]]
name = "icu_properties_data"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e2bbb201e0c04f7b4b3e14382af113e17ba4f63e2c9d2ee626b720cbce54a14"
[[package]]
name = "icu_provider"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "139c4cf31c8b5f33d7e199446eff9c1e02decfc2f0eec2c8d71f65befa45b421"
dependencies = [
"displaydoc",
"icu_locale_core",
"writeable",
"yoke",
"zerofrom",
"zerotrie",
"zerovec",
] ]
[[package]] [[package]]
@ -1003,27 +894,6 @@ version = "2.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3d3067d79b975e8844ca9eb072e16b31c3c1c36928edf9c6789548c524d0d954" checksum = "3d3067d79b975e8844ca9eb072e16b31c3c1c36928edf9c6789548c524d0d954"
[[package]]
name = "idna"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3b0875f23caa03898994f6ddc501886a45c7d3d62d04d2d90788d47be1b1e4de"
dependencies = [
"idna_adapter",
"smallvec",
"utf8_iter",
]
[[package]]
name = "idna_adapter"
version = "1.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb68373c0d6620ef8105e855e7745e18b0d00d3bdb07fb532e434244cdb9a714"
dependencies = [
"icu_normalizer",
"icu_properties",
]
[[package]] [[package]]
name = "indexmap" name = "indexmap"
version = "2.14.0" version = "2.14.0"
@ -1065,22 +935,6 @@ dependencies = [
"cfg-if", "cfg-if",
] ]
[[package]]
name = "ipnet"
version = "2.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d98f6fed1fde3f8c21bc40a1abb88dd75e67924f9cffc3ef95607bad8017f8e2"
[[package]]
name = "iri-string"
version = "0.7.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "25e659a4bb38e810ebc252e53b5814ff908a8c58c2a9ce2fae1bbec24cbf4e20"
dependencies = [
"memchr",
"serde",
]
[[package]] [[package]]
name = "is_terminal_polyfill" name = "is_terminal_polyfill"
version = "1.70.2" version = "1.70.2"
@ -1103,18 +957,6 @@ dependencies = [
"libc", "libc",
] ]
[[package]]
name = "js-sys"
version = "0.3.97"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1840c94c045fbcf8ba2812c95db44499f7c64910a912551aaaa541decebcacf"
dependencies = [
"cfg-if",
"futures-util",
"once_cell",
"wasm-bindgen",
]
[[package]] [[package]]
name = "kqueue" name = "kqueue"
version = "1.1.1" version = "1.1.1"
@ -1171,12 +1013,6 @@ version = "0.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53" checksum = "32a66949e030da00e8c7d4434b251670a91556f4144941d37452769c25d58a53"
[[package]]
name = "litemap"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92daf443525c4cce67b150400bc2316076100ce0b3686209eb8cf3c31612e6f0"
[[package]] [[package]]
name = "lock_api" name = "lock_api"
version = "0.4.14" version = "0.4.14"
@ -1405,15 +1241,6 @@ version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4596b6d070b27117e987119b4dac604f3c58cfb0b191112e24771b2faeac1a6" checksum = "b4596b6d070b27117e987119b4dac604f3c58cfb0b191112e24771b2faeac1a6"
[[package]]
name = "potential_utf"
version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0103b1cef7ec0cf76490e969665504990193874ea05c85ff9bab8b911d0a0564"
dependencies = [
"zerovec",
]
[[package]] [[package]]
name = "prettyplease" name = "prettyplease"
version = "0.2.37" version = "0.2.37"
@ -1509,38 +1336,6 @@ version = "0.8.10"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a" checksum = "dc897dd8d9e8bd1ed8cdad82b5966c3e0ecae09fb1907d58efaa013543185d0a"
[[package]]
name = "reqwest"
version = "0.12.28"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eddd3ca559203180a307f12d114c268abf583f59b03cb906fd0b3ff8646c1147"
dependencies = [
"base64",
"bytes",
"futures-core",
"http",
"http-body",
"http-body-util",
"hyper",
"hyper-util",
"js-sys",
"log",
"percent-encoding",
"pin-project-lite",
"serde",
"serde_json",
"serde_urlencoded",
"sync_wrapper",
"tokio",
"tower",
"tower-http",
"tower-service",
"url",
"wasm-bindgen",
"wasm-bindgen-futures",
"web-sys",
]
[[package]] [[package]]
name = "rustix" name = "rustix"
version = "1.1.4" version = "1.1.4"
@ -1554,12 +1349,6 @@ dependencies = [
"windows-sys 0.61.2", "windows-sys 0.61.2",
] ]
[[package]]
name = "rustversion"
version = "1.0.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d"
[[package]] [[package]]
name = "ryu" name = "ryu"
version = "1.0.23" version = "1.0.23"
@ -1723,12 +1512,6 @@ version = "0.9.8"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67" checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67"
[[package]]
name = "stable_deref_trait"
version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596"
[[package]] [[package]]
name = "strsim" name = "strsim"
version = "0.11.1" version = "0.11.1"
@ -1757,20 +1540,6 @@ name = "sync_wrapper"
version = "1.0.2" version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0bf256ce5efdfa370213c1dabab5935a12e49f2c58d15e9eac2870d3b4f27263" checksum = "0bf256ce5efdfa370213c1dabab5935a12e49f2c58d15e9eac2870d3b4f27263"
dependencies = [
"futures-core",
]
[[package]]
name = "synstructure"
version = "0.13.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "728a70f3dbaf5bab7f0c4b1ac8d7ae5ea60a4b5549c8a5914361c99147a709d2"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]] [[package]]
name = "sysinfo" name = "sysinfo"
@ -1828,16 +1597,6 @@ dependencies = [
"cfg-if", "cfg-if",
] ]
[[package]]
name = "tinystr"
version = "0.8.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8323304221c2a851516f22236c5722a72eaa19749016521d6dff0824447d96d"
dependencies = [
"displaydoc",
"zerovec",
]
[[package]] [[package]]
name = "tokio" name = "tokio"
version = "1.52.1" version = "1.52.1"
@ -1911,14 +1670,12 @@ dependencies = [
"http-body-util", "http-body-util",
"http-range-header", "http-range-header",
"httpdate", "httpdate",
"iri-string",
"mime", "mime",
"mime_guess", "mime_guess",
"percent-encoding", "percent-encoding",
"pin-project-lite", "pin-project-lite",
"tokio", "tokio",
"tokio-util", "tokio-util",
"tower",
"tower-layer", "tower-layer",
"tower-service", "tower-service",
"tracing", "tracing",
@ -2011,12 +1768,6 @@ dependencies = [
"tracing-serde", "tracing-serde",
] ]
[[package]]
name = "try-lock"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e421abadd41a4225275504ea4d6566923418b7f05506fbc9c0fe86ba7396114b"
[[package]] [[package]]
name = "typenum" name = "typenum"
version = "1.20.0" version = "1.20.0"
@ -2041,24 +1792,6 @@ version = "0.2.6"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ebc1c04c71510c7f702b52b7c350734c9ff1295c464a03335b00bb84fc54f853" checksum = "ebc1c04c71510c7f702b52b7c350734c9ff1295c464a03335b00bb84fc54f853"
[[package]]
name = "url"
version = "2.5.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ff67a8a4397373c3ef660812acab3268222035010ab8680ec4215f38ba3d0eed"
dependencies = [
"form_urlencoded",
"idna",
"percent-encoding",
"serde",
]
[[package]]
name = "utf8_iter"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6c140620e7ffbb22c2dee59cafe6084a59b5ffc27a8859a5f0d494b5d52b6be"
[[package]] [[package]]
name = "utf8parse" name = "utf8parse"
version = "0.2.2" version = "0.2.2"
@ -2087,15 +1820,6 @@ dependencies = [
"winapi-util", "winapi-util",
] ]
[[package]]
name = "want"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bfa7760aed19e106de2c7c0b581b509f2f25d3dacaf737cb82ac61bc6d760b0e"
dependencies = [
"try-lock",
]
[[package]] [[package]]
name = "wasi" name = "wasi"
version = "0.11.1+wasi-snapshot-preview1" version = "0.11.1+wasi-snapshot-preview1"
@ -2120,61 +1844,6 @@ dependencies = [
"wit-bindgen 0.51.0", "wit-bindgen 0.51.0",
] ]
[[package]]
name = "wasm-bindgen"
version = "0.2.120"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df52b6d9b87e0c74c9edfa1eb2d9bf85e5d63515474513aa50fa181b3c4f5db1"
dependencies = [
"cfg-if",
"once_cell",
"rustversion",
"wasm-bindgen-macro",
"wasm-bindgen-shared",
]
[[package]]
name = "wasm-bindgen-futures"
version = "0.4.70"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af934872acec734c2d80e6617bbb5ff4f12b052dd8e6332b0817bce889516084"
dependencies = [
"js-sys",
"wasm-bindgen",
]
[[package]]
name = "wasm-bindgen-macro"
version = "0.2.120"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "78b1041f495fb322e64aca85f5756b2172e35cd459376e67f2a6c9dffcedb103"
dependencies = [
"quote",
"wasm-bindgen-macro-support",
]
[[package]]
name = "wasm-bindgen-macro-support"
version = "0.2.120"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9dcd0ff20416988a18ac686d4d4d0f6aae9ebf08a389ff5d29012b05af2a1b41"
dependencies = [
"bumpalo",
"proc-macro2",
"quote",
"syn",
"wasm-bindgen-shared",
]
[[package]]
name = "wasm-bindgen-shared"
version = "0.2.120"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "49757b3c82ebf16c57d69365a142940b384176c24df52a087fb748e2085359ea"
dependencies = [
"unicode-ident",
]
[[package]] [[package]]
name = "wasm-encoder" name = "wasm-encoder"
version = "0.244.0" version = "0.244.0"
@ -2209,16 +1878,6 @@ dependencies = [
"semver", "semver",
] ]
[[package]]
name = "web-sys"
version = "0.3.97"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2eadbac71025cd7b0834f20d1fe8472e8495821b4e9801eb0a60bd1f19827602"
dependencies = [
"js-sys",
"wasm-bindgen",
]
[[package]] [[package]]
name = "winapi" name = "winapi"
version = "0.3.9" version = "0.3.9"
@ -2485,56 +2144,6 @@ dependencies = [
"wasmparser", "wasmparser",
] ]
[[package]]
name = "writeable"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1ffae5123b2d3fc086436f8834ae3ab053a283cfac8fe0a0b8eaae044768a4c4"
[[package]]
name = "yoke"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "abe8c5fda708d9ca3df187cae8bfb9ceda00dd96231bed36e445a1a48e66f9ca"
dependencies = [
"stable_deref_trait",
"yoke-derive",
"zerofrom",
]
[[package]]
name = "yoke-derive"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "de844c262c8848816172cef550288e7dc6c7b7814b4ee56b3e1553f275f1858e"
dependencies = [
"proc-macro2",
"quote",
"syn",
"synstructure",
]
[[package]]
name = "zerofrom"
version = "0.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "69faa1f2a1ea75661980b013019ed6687ed0e83d069bc1114e2cc74c6c04c4df"
dependencies = [
"zerofrom-derive",
]
[[package]]
name = "zerofrom-derive"
version = "0.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "11532158c46691caf0f2593ea8358fed6bbf68a0315e80aae9bd41fbade684a1"
dependencies = [
"proc-macro2",
"quote",
"syn",
"synstructure",
]
[[package]] [[package]]
name = "zeroize" name = "zeroize"
version = "1.8.2" version = "1.8.2"
@ -2555,39 +2164,6 @@ dependencies = [
"syn", "syn",
] ]
[[package]]
name = "zerotrie"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0f9152d31db0792fa83f70fb2f83148effb5c1f5b8c7686c3459e361d9bc20bf"
dependencies = [
"displaydoc",
"yoke",
"zerofrom",
]
[[package]]
name = "zerovec"
version = "0.11.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "90f911cbc359ab6af17377d242225f4d75119aec87ea711a880987b18cd7b239"
dependencies = [
"yoke",
"zerofrom",
"zerovec-derive",
]
[[package]]
name = "zerovec-derive"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "625dc425cab0dca6dc3c3319506e6593dcb08a9f387ea3b284dbd52a92c40555"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]] [[package]]
name = "zmij" name = "zmij"
version = "1.0.21" version = "1.0.21"

View File

@ -1,6 +1,6 @@
[package] [package]
name = "envd" name = "envd"
version = "0.2.1" version = "0.3.0"
edition = "2024" edition = "2024"
rust-version = "1.88" rust-version = "1.88"
@ -53,9 +53,6 @@ notify = "7"
# Compression # Compression
flate2 = "1" flate2 = "1"
# HTTP client (MMDS polling)
reqwest = { version = "0.12", default-features = false, features = ["json"] }
# Directory walking # Directory walking
walkdir = "2" walkdir = "2"

View File

@ -1,6 +1,6 @@
# envd (Rust) # envd (Rust)
Wrenn guest agent daemon — runs as PID 1 inside Firecracker microVMs. Provides process management, filesystem operations, file transfer, port forwarding, and VM lifecycle control over Connect RPC and HTTP. Wrenn guest agent daemon — runs as PID 1 inside Cloud Hypervisor microVMs. Provides process management, filesystem operations, file transfer, port forwarding, and VM lifecycle control over Connect RPC and HTTP.
Rust rewrite of `envd/` (Go). Drop-in replacement — same wire protocol, same endpoints, same CLI flags. Rust rewrite of `envd/` (Go). Drop-in replacement — same wire protocol, same endpoints, same CLI flags.
@ -50,7 +50,7 @@ cargo build
Run locally (outside a VM): Run locally (outside a VM):
```bash ```bash
./target/debug/envd --isnotfc --port 49983 ./target/debug/envd --port 49983
``` ```
### Via Makefile (from repo root) ### Via Makefile (from repo root)
@ -64,7 +64,6 @@ make build-envd-go # Go version (for comparison)
``` ```
--port <PORT> Listen port [default: 49983] --port <PORT> Listen port [default: 49983]
--isnotfc Not running inside Firecracker (disables MMDS, cgroups)
--version Print version and exit --version Print version and exit
--commit Print git commit and exit --commit Print git commit and exit
--cmd <CMD> Spawn a process at startup (e.g. --cmd "/bin/bash") --cmd <CMD> Spawn a process at startup (e.g. --cmd "/bin/bash")
@ -81,7 +80,7 @@ make build-envd-go # Go version (for comparison)
| GET | `/metrics` | System metrics (CPU, memory, disk) | | GET | `/metrics` | System metrics (CPU, memory, disk) |
| GET | `/envs` | Current environment variables | | GET | `/envs` | Current environment variables |
| POST | `/init` | Host agent init (token, env, mounts) | | POST | `/init` | Host agent init (token, env, mounts) |
| POST | `/snapshot/prepare` | Quiesce before Firecracker snapshot | | POST | `/snapshot/prepare` | Quiesce before Cloud Hypervisor snapshot |
| GET | `/files` | Download file (gzip, range support) | | GET | `/files` | Download file (gzip, range support) |
| POST | `/files` | Upload file(s) via multipart | | POST | `/files` | Upload file(s) via multipart |
@ -108,7 +107,7 @@ src/
├── util.rs # AtomicMax ├── util.rs # AtomicMax
├── auth/ # Token, signing, middleware ├── auth/ # Token, signing, middleware
├── crypto/ # SHA-256, SHA-512, HMAC ├── crypto/ # SHA-256, SHA-512, HMAC
├── host/ # MMDS polling, system metrics ├── host/ # System metrics
├── http/ # Axum handlers (health, init, snapshot, files, encoding) ├── http/ # Axum handlers (health, init, snapshot, files, encoding)
├── permissions/ # Path resolution, user lookup, chown ├── permissions/ # Path resolution, user lookup, chown
├── rpc/ # Connect RPC services ├── rpc/ # Connect RPC services

View File

@ -9,8 +9,3 @@ pub const WRENN_RUN_DIR: &str = "/run/wrenn";
pub const KILOBYTE: u64 = 1024; pub const KILOBYTE: u64 = 1024;
pub const MEGABYTE: u64 = 1024 * KILOBYTE; pub const MEGABYTE: u64 = 1024 * KILOBYTE;
pub const MMDS_ADDRESS: &str = "169.254.169.254";
pub const MMDS_POLL_INTERVAL: Duration = Duration::from_millis(50);
pub const MMDS_TOKEN_EXPIRATION_SECS: u64 = 60;
pub const MMDS_ACCESS_TOKEN_CLIENT_TIMEOUT: Duration = Duration::from_secs(10);

View File

@ -1,73 +0,0 @@
use std::ffi::CString;
use std::time::{SystemTime, UNIX_EPOCH};
use serde::Serialize;
#[derive(Serialize)]
pub struct Metrics {
pub ts: i64,
pub cpu_count: u32,
pub cpu_used_pct: f32,
pub mem_total_mib: u64,
pub mem_used_mib: u64,
pub mem_total: u64,
pub mem_used: u64,
pub disk_used: u64,
pub disk_total: u64,
}
pub fn get_metrics() -> Result<Metrics, String> {
use sysinfo::System;
let mut sys = System::new();
sys.refresh_memory();
sys.refresh_cpu_all();
std::thread::sleep(std::time::Duration::from_millis(100));
sys.refresh_cpu_all();
let cpu_count = sys.cpus().len() as u32;
let cpu_used_pct = sys.global_cpu_usage();
let cpu_used_pct_rounded = if cpu_used_pct > 0.0 {
(cpu_used_pct * 100.0).round() / 100.0
} else {
0.0
};
let mem_total = sys.total_memory();
let mem_used = sys.used_memory();
let (disk_total, disk_used) = disk_stats("/")?;
let ts = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs() as i64;
Ok(Metrics {
ts,
cpu_count,
cpu_used_pct: cpu_used_pct_rounded,
mem_total_mib: mem_total / 1024 / 1024,
mem_used_mib: mem_used / 1024 / 1024,
mem_total,
mem_used,
disk_used,
disk_total,
})
}
fn disk_stats(path: &str) -> Result<(u64, u64), String> {
let c_path = CString::new(path).unwrap();
let mut stat: libc::statfs = unsafe { std::mem::zeroed() };
let ret = unsafe { libc::statfs(c_path.as_ptr(), &mut stat) };
if ret != 0 {
return Err(format!("statfs failed: {}", std::io::Error::last_os_error()));
}
let block = stat.f_bsize as u64;
let total = stat.f_blocks * block;
let available = stat.f_bavail * block;
Ok((total, total - available))
}

View File

@ -1,120 +0,0 @@
use std::sync::Arc;
use std::time::Duration;
use dashmap::DashMap;
use serde::Deserialize;
use tokio_util::sync::CancellationToken;
use crate::config::{MMDS_ADDRESS, MMDS_POLL_INTERVAL, MMDS_TOKEN_EXPIRATION_SECS, WRENN_RUN_DIR};
#[derive(Debug, Clone, Deserialize)]
pub struct MMDSOpts {
#[serde(rename = "instanceID")]
pub sandbox_id: String,
#[serde(rename = "envID")]
pub template_id: String,
#[serde(rename = "address", default)]
pub logs_collector_address: String,
#[serde(rename = "accessTokenHash", default)]
pub access_token_hash: String,
}
async fn get_mmds_token(client: &reqwest::Client) -> Result<String, String> {
let resp = client
.put(format!("http://{MMDS_ADDRESS}/latest/api/token"))
.header(
"X-metadata-token-ttl-seconds",
MMDS_TOKEN_EXPIRATION_SECS.to_string(),
)
.send()
.await
.map_err(|e| format!("mmds token request failed: {e}"))?;
let token = resp.text().await.map_err(|e| format!("mmds token read: {e}"))?;
if token.is_empty() {
return Err("mmds token is an empty string".into());
}
Ok(token)
}
async fn get_mmds_opts(client: &reqwest::Client, token: &str) -> Result<MMDSOpts, String> {
let resp = client
.get(format!("http://{MMDS_ADDRESS}"))
.header("X-metadata-token", token)
.header("Accept", "application/json")
.send()
.await
.map_err(|e| format!("mmds opts request failed: {e}"))?;
resp.json::<MMDSOpts>()
.await
.map_err(|e| format!("mmds opts parse: {e}"))
}
pub async fn get_access_token_hash() -> Result<String, String> {
let client = reqwest::Client::builder()
.timeout(Duration::from_secs(10))
.no_proxy()
.build()
.map_err(|e| format!("http client: {e}"))?;
let token = get_mmds_token(&client).await?;
let opts = get_mmds_opts(&client, &token).await?;
Ok(opts.access_token_hash)
}
/// Polls MMDS every 50ms until metadata is available.
/// Stores sandbox_id and template_id in env_vars and writes to /run/wrenn/ files.
pub async fn poll_for_opts(
env_vars: Arc<DashMap<String, String>>,
cancel: CancellationToken,
) -> Option<MMDSOpts> {
let client = reqwest::Client::builder()
.no_proxy()
.build()
.ok()?;
let mut interval = tokio::time::interval(MMDS_POLL_INTERVAL);
loop {
tokio::select! {
_ = cancel.cancelled() => {
tracing::warn!("context cancelled while waiting for mmds opts");
return None;
}
_ = interval.tick() => {
let token = match get_mmds_token(&client).await {
Ok(t) => t,
Err(e) => {
tracing::debug!(error = %e, "mmds token poll");
continue;
}
};
let opts = match get_mmds_opts(&client, &token).await {
Ok(o) => o,
Err(e) => {
tracing::debug!(error = %e, "mmds opts poll");
continue;
}
};
env_vars.insert("WRENN_SANDBOX_ID".into(), opts.sandbox_id.clone());
env_vars.insert("WRENN_TEMPLATE_ID".into(), opts.template_id.clone());
let run_dir = std::path::Path::new(WRENN_RUN_DIR);
if let Err(e) = std::fs::create_dir_all(run_dir) {
tracing::error!(error = %e, "mmds: failed to create run dir");
}
if let Err(e) = std::fs::write(run_dir.join(".WRENN_SANDBOX_ID"), &opts.sandbox_id) {
tracing::error!(error = %e, "mmds: failed to write .WRENN_SANDBOX_ID");
}
if let Err(e) = std::fs::write(run_dir.join(".WRENN_TEMPLATE_ID"), &opts.template_id) {
tracing::error!(error = %e, "mmds: failed to write .WRENN_TEMPLATE_ID");
}
return Some(opts);
}
}
}
}

View File

@ -1,2 +0,0 @@
pub mod metrics;
pub mod mmds;

View File

@ -1,5 +1,4 @@
use std::sync::Arc; use std::sync::Arc;
use std::sync::atomic::Ordering;
use axum::Json; use axum::Json;
use axum::extract::State; use axum::extract::State;
@ -10,13 +9,7 @@ use serde_json::json;
use crate::state::AppState; use crate::state::AppState;
pub async fn get_health(State(state): State<Arc<AppState>>) -> impl IntoResponse { pub async fn get_health(State(state): State<Arc<AppState>>) -> impl IntoResponse {
if state state.try_restore_recovery();
.needs_restore
.compare_exchange(true, false, Ordering::AcqRel, Ordering::Relaxed)
.is_ok()
{
post_restore_recovery(&state);
}
tracing::trace!("health check"); tracing::trace!("health check");
@ -25,17 +18,3 @@ pub async fn get_health(State(state): State<Arc<AppState>>) -> impl IntoResponse
Json(json!({ "version": state.version })), Json(json!({ "version": state.version })),
) )
} }
fn post_restore_recovery(state: &AppState) {
tracing::info!("restore: post-restore recovery (no GC needed in Rust)");
state.snapshot_in_progress.store(false, std::sync::atomic::Ordering::Release);
state.conn_tracker.restore_after_snapshot();
tracing::info!("restore: zombie connections closed");
if let Some(ref ps) = state.port_subsystem {
ps.restart();
tracing::info!("restore: port subsystem restarted");
}
}

View File

@ -1,6 +1,5 @@
use std::collections::HashMap; use std::collections::HashMap;
use std::sync::Arc; use std::sync::Arc;
use std::sync::atomic::Ordering;
use axum::Json; use axum::Json;
use axum::extract::State; use axum::extract::State;
@ -8,20 +7,25 @@ use axum::http::{StatusCode, header};
use axum::response::IntoResponse; use axum::response::IntoResponse;
use serde::Deserialize; use serde::Deserialize;
use crate::crypto;
use crate::host::mmds;
use crate::state::AppState; use crate::state::AppState;
#[derive(Deserialize, Default)] #[derive(Deserialize, Default)]
#[serde(rename_all = "camelCase")]
pub struct InitRequest { pub struct InitRequest {
#[serde(rename = "access_token")]
pub access_token: Option<String>, pub access_token: Option<String>,
#[serde(rename = "defaultUser")]
pub default_user: Option<String>, pub default_user: Option<String>,
#[serde(rename = "defaultWorkdir")]
pub default_workdir: Option<String>, pub default_workdir: Option<String>,
#[serde(rename = "envVars")]
pub env_vars: Option<HashMap<String, String>>, pub env_vars: Option<HashMap<String, String>>,
#[serde(rename = "hyperloop_ip")]
pub hyperloop_ip: Option<String>, pub hyperloop_ip: Option<String>,
pub timestamp: Option<String>, pub timestamp: Option<String>,
#[serde(rename = "volume_mounts")]
pub volume_mounts: Option<Vec<VolumeMount>>, pub volume_mounts: Option<Vec<VolumeMount>>,
pub sandbox_id: Option<String>,
pub template_id: Option<String>,
} }
#[derive(Deserialize)] #[derive(Deserialize)]
@ -110,37 +114,27 @@ pub async fn post_init(
} }
} }
// Re-poll MMDS in background // Set sandbox/template metadata from request body.
if state.is_fc { if let Some(ref id) = init_req.sandbox_id {
let env_vars = Arc::clone(&state.defaults.env_vars); tracing::debug!(sandbox_id = %id, "setting sandbox ID from init request");
let cancel = tokio_util::sync::CancellationToken::new(); // SAFETY: envd is single-threaded at init time; no concurrent env reads.
let cancel_clone = cancel.clone(); unsafe { std::env::set_var("WRENN_SANDBOX_ID", id) };
tokio::spawn(async move { write_run_file(".WRENN_SANDBOX_ID", id);
tokio::time::timeout(std::time::Duration::from_secs(60), async { state.defaults.env_vars.insert("WRENN_SANDBOX_ID".into(), id.clone());
mmds::poll_for_opts(env_vars, cancel_clone).await; }
}) if let Some(ref id) = init_req.template_id {
.await tracing::debug!(template_id = %id, "setting template ID from init request");
.ok(); // SAFETY: envd is single-threaded at init time; no concurrent env reads.
}); unsafe { std::env::set_var("WRENN_TEMPLATE_ID", id) };
write_run_file(".WRENN_TEMPLATE_ID", id);
state.defaults.env_vars.insert("WRENN_TEMPLATE_ID".into(), id.clone());
} }
trigger_restore_and_respond(&state).await trigger_restore_and_respond(&state).await
} }
async fn trigger_restore_and_respond(state: &AppState) -> axum::response::Response { async fn trigger_restore_and_respond(state: &AppState) -> axum::response::Response {
// Safety net: if health check's postRestoreRecovery hasn't run yet state.try_restore_recovery();
if state
.needs_restore
.compare_exchange(true, false, Ordering::AcqRel, Ordering::Relaxed)
.is_ok()
{
post_restore_recovery(state);
}
state.conn_tracker.restore_after_snapshot();
if let Some(ref ps) = state.port_subsystem {
ps.restart();
}
( (
StatusCode::NO_CONTENT, StatusCode::NO_CONTENT,
@ -149,46 +143,13 @@ async fn trigger_restore_and_respond(state: &AppState) -> axum::response::Respon
.into_response() .into_response()
} }
fn post_restore_recovery(state: &AppState) {
tracing::info!("restore: post-restore recovery (no GC needed in Rust)");
state.snapshot_in_progress.store(false, std::sync::atomic::Ordering::Release);
state.conn_tracker.restore_after_snapshot();
if let Some(ref ps) = state.port_subsystem {
ps.restart();
tracing::info!("restore: port subsystem restarted");
}
}
async fn validate_init_access_token(state: &AppState, request_token: &str) -> Result<(), String> { async fn validate_init_access_token(state: &AppState, request_token: &str) -> Result<(), String> {
// Fast path: matches existing token // Fast path: matches existing token
if state.access_token.is_set() && !request_token.is_empty() && state.access_token.equals(request_token) { if state.access_token.is_set() && !request_token.is_empty() && state.access_token.equals(request_token) {
return Ok(()); return Ok(());
} }
// Check MMDS hash // First-time setup: no existing token
if state.is_fc {
if let Ok(mmds_hash) = mmds::get_access_token_hash().await {
if !mmds_hash.is_empty() {
if request_token.is_empty() {
let empty_hash = crypto::sha512::hash_access_token("");
if mmds_hash == empty_hash {
return Ok(());
}
} else {
let token_hash = crypto::sha512::hash_access_token(request_token);
if mmds_hash == token_hash {
return Ok(());
}
}
return Err("access token validation failed".into());
}
}
}
// First-time setup: no existing token and no MMDS
if !state.access_token.is_set() { if !state.access_token.is_set() {
return Ok(()); return Ok(());
} }
@ -268,14 +229,21 @@ async fn setup_nfs(nfs_target: &str, path: &str) {
} }
} }
fn write_run_file(name: &str, value: &str) {
let dir = std::path::Path::new("/run/wrenn");
if let Err(e) = std::fs::create_dir_all(dir) {
tracing::warn!(error = %e, "failed to create /run/wrenn");
return;
}
if let Err(e) = std::fs::write(dir.join(name), value) {
tracing::warn!(error = %e, name, "failed to write run file");
}
}
fn chrono_parse_to_nanos(ts: &str) -> Result<i64, ()> { fn chrono_parse_to_nanos(ts: &str) -> Result<i64, ()> {
// Parse RFC3339 timestamp to nanoseconds since epoch
// Simple approach: parse as seconds + fractional
let secs = ts.parse::<f64>().ok(); let secs = ts.parse::<f64>().ok();
if let Some(s) = secs { if let Some(s) = secs {
return Ok((s * 1_000_000_000.0) as i64); return Ok((s * 1_000_000_000.0) as i64);
} }
// Try RFC3339 format
// For now, fall back to allowing the update
Err(()) Err(())
} }

View File

@ -7,7 +7,7 @@ use axum::response::IntoResponse;
use crate::state::AppState; use crate::state::AppState;
/// POST /snapshot/prepare — quiesce subsystems before Firecracker snapshot. /// POST /snapshot/prepare — quiesce subsystems before VM snapshot.
/// ///
/// In Rust there is no GC dance. We just: /// In Rust there is no GC dance. We just:
/// 1. Drop page cache to shrink snapshot size /// 1. Drop page cache to shrink snapshot size

View File

@ -6,7 +6,6 @@ mod config;
mod conntracker; mod conntracker;
mod crypto; mod crypto;
mod execcontext; mod execcontext;
mod host;
mod http; mod http;
mod logging; mod logging;
mod permissions; mod permissions;
@ -22,7 +21,6 @@ use std::sync::Arc;
use clap::Parser; use clap::Parser;
use tokio::net::TcpListener; use tokio::net::TcpListener;
use tokio_util::sync::CancellationToken;
use config::{DEFAULT_PORT, DEFAULT_USER, WRENN_RUN_DIR}; use config::{DEFAULT_PORT, DEFAULT_USER, WRENN_RUN_DIR};
use execcontext::Defaults; use execcontext::Defaults;
@ -44,9 +42,6 @@ struct Cli {
#[arg(long, default_value_t = DEFAULT_PORT)] #[arg(long, default_value_t = DEFAULT_PORT)]
port: u16, port: u16,
#[arg(long = "isnotfc", default_value_t = false)]
is_not_fc: bool,
#[arg(long)] #[arg(long)]
version: bool, version: bool,
@ -73,35 +68,22 @@ async fn main() {
return; return;
} }
let use_json = !cli.is_not_fc; logging::init(true);
logging::init(use_json);
if let Err(e) = fs::create_dir_all(WRENN_RUN_DIR) { if let Err(e) = fs::create_dir_all(WRENN_RUN_DIR) {
tracing::error!(error = %e, "failed to create wrenn run directory"); tracing::error!(error = %e, "failed to create wrenn run directory");
} }
let defaults = Defaults::new(DEFAULT_USER); let defaults = Defaults::new(DEFAULT_USER);
let is_fc_str = if cli.is_not_fc { "false" } else { "true" };
defaults defaults
.env_vars .env_vars
.insert("WRENN_SANDBOX".into(), is_fc_str.into()); .insert("WRENN_SANDBOX".into(), "true".into());
let wrenn_sandbox_path = Path::new(WRENN_RUN_DIR).join(".WRENN_SANDBOX"); let wrenn_sandbox_path = Path::new(WRENN_RUN_DIR).join(".WRENN_SANDBOX");
if let Err(e) = fs::write(&wrenn_sandbox_path, is_fc_str.as_bytes()) { if let Err(e) = fs::write(&wrenn_sandbox_path, b"true") {
tracing::error!(error = %e, "failed to write sandbox file"); tracing::error!(error = %e, "failed to write sandbox file");
} }
let cancel = CancellationToken::new();
// MMDS polling (only in FC mode)
if !cli.is_not_fc {
let env_vars = Arc::clone(&defaults.env_vars);
let cancel_clone = cancel.clone();
tokio::spawn(async move {
host::mmds::poll_for_opts(env_vars, cancel_clone).await;
});
}
// Cgroup manager // Cgroup manager
let cgroup_manager: Arc<dyn cgroups::CgroupManager> = let cgroup_manager: Arc<dyn cgroups::CgroupManager> =
match cgroups::Cgroup2Manager::new( match cgroups::Cgroup2Manager::new(
@ -143,14 +125,13 @@ async fn main() {
defaults, defaults,
VERSION.to_string(), VERSION.to_string(),
COMMIT.to_string(), COMMIT.to_string(),
!cli.is_not_fc,
Some(Arc::clone(&port_subsystem)), Some(Arc::clone(&port_subsystem)),
); );
// Memory reclaimer — drop page cache when available memory is low. // Memory reclaimer — drop page cache when available memory is low.
// Firecracker balloon device can only reclaim pages the guest kernel freed. // The balloon device can only reclaim pages the guest kernel freed.
// Pauses during snapshot/prepare to avoid corrupting kernel page table state. // Pauses during snapshot/prepare to avoid corrupting kernel page table state.
if !cli.is_not_fc { {
let state_for_reclaimer = Arc::clone(&state); let state_for_reclaimer = Arc::clone(&state);
std::thread::spawn(move || memory_reclaimer(state_for_reclaimer)); std::thread::spawn(move || memory_reclaimer(state_for_reclaimer));
} }
@ -188,7 +169,6 @@ async fn main() {
} }
port_subsystem.stop(); port_subsystem.stop();
cancel.cancel();
} }
fn spawn_initial_command(cmd: &str, state: &AppState) { fn spawn_initial_command(cmd: &str, state: &AppState) {
@ -233,9 +213,11 @@ fn spawn_initial_command(cmd: &str, state: &AppState) {
fn memory_reclaimer(state: Arc<AppState>) { fn memory_reclaimer(state: Arc<AppState>) {
use std::sync::atomic::Ordering; use std::sync::atomic::Ordering;
use std::time::{Duration, SystemTime, UNIX_EPOCH};
const CHECK_INTERVAL: std::time::Duration = std::time::Duration::from_secs(10); const CHECK_INTERVAL: Duration = Duration::from_secs(10);
const DROP_THRESHOLD_PCT: u64 = 80; const DROP_THRESHOLD_PCT: u64 = 80;
const RESTORE_GRACE_SECS: u64 = 30;
loop { loop {
std::thread::sleep(CHECK_INTERVAL); std::thread::sleep(CHECK_INTERVAL);
@ -244,6 +226,20 @@ fn memory_reclaimer(state: Arc<AppState>) {
continue; continue;
} }
// Skip during post-restore grace period. Balloon deflation causes
// transient high memory that resolves on its own — triggering
// drop_caches during UFFD page fault storms makes the guest unresponsive.
let restore_epoch = state.restore_epoch.load(Ordering::Acquire);
if restore_epoch > 0 {
let now = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or_default()
.as_secs();
if now.saturating_sub(restore_epoch) < RESTORE_GRACE_SECS {
continue;
}
}
let mut sys = sysinfo::System::new(); let mut sys = sysinfo::System::new();
sys.refresh_memory(); sys.refresh_memory();
let total = sys.total_memory(); let total = sys.total_memory();

View File

@ -57,7 +57,9 @@ impl Scanner {
pub async fn scan_and_broadcast(&self, cancel: CancellationToken) { pub async fn scan_and_broadcast(&self, cancel: CancellationToken) {
loop { loop {
let conns = read_tcp_connections(); let conns = tokio::task::spawn_blocking(read_tcp_connections)
.await
.unwrap_or_default();
{ {
let subs = self.subs.read().unwrap(); let subs = self.subs.read().unwrap();

View File

@ -1,5 +1,6 @@
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering}; use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
use std::sync::Arc; use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH};
use crate::auth::token::SecureToken; use crate::auth::token::SecureToken;
use crate::conntracker::ConnTracker; use crate::conntracker::ConnTracker;
@ -11,7 +12,6 @@ pub struct AppState {
pub defaults: Defaults, pub defaults: Defaults,
pub version: String, pub version: String,
pub commit: String, pub commit: String,
pub is_fc: bool,
pub needs_restore: AtomicBool, pub needs_restore: AtomicBool,
pub last_set_time: AtomicMax, pub last_set_time: AtomicMax,
pub access_token: SecureToken, pub access_token: SecureToken,
@ -20,6 +20,8 @@ pub struct AppState {
pub cpu_used_pct: AtomicU32, pub cpu_used_pct: AtomicU32,
pub cpu_count: AtomicU32, pub cpu_count: AtomicU32,
pub snapshot_in_progress: AtomicBool, pub snapshot_in_progress: AtomicBool,
pub last_health_epoch: AtomicU64,
pub restore_epoch: AtomicU64,
} }
impl AppState { impl AppState {
@ -27,14 +29,12 @@ impl AppState {
defaults: Defaults, defaults: Defaults,
version: String, version: String,
commit: String, commit: String,
is_fc: bool,
port_subsystem: Option<Arc<PortSubsystem>>, port_subsystem: Option<Arc<PortSubsystem>>,
) -> Arc<Self> { ) -> Arc<Self> {
let state = Arc::new(Self { let state = Arc::new(Self {
defaults, defaults,
version, version,
commit, commit,
is_fc,
needs_restore: AtomicBool::new(false), needs_restore: AtomicBool::new(false),
last_set_time: AtomicMax::new(), last_set_time: AtomicMax::new(),
access_token: SecureToken::new(), access_token: SecureToken::new(),
@ -43,6 +43,8 @@ impl AppState {
cpu_used_pct: AtomicU32::new(0), cpu_used_pct: AtomicU32::new(0),
cpu_count: AtomicU32::new(0), cpu_count: AtomicU32::new(0),
snapshot_in_progress: AtomicBool::new(false), snapshot_in_progress: AtomicBool::new(false),
last_health_epoch: AtomicU64::new(0),
restore_epoch: AtomicU64::new(0),
}); });
let state_clone = Arc::clone(&state); let state_clone = Arc::clone(&state);
@ -60,6 +62,47 @@ impl AppState {
pub fn cpu_count(&self) -> u32 { pub fn cpu_count(&self) -> u32 {
self.cpu_count.load(Ordering::Relaxed) self.cpu_count.load(Ordering::Relaxed)
} }
/// Runs post-restore recovery if `needs_restore` is set OR a wall-clock
/// gap is detected (catches restores where snapshot/prepare never ran).
pub fn try_restore_recovery(&self) {
let now_epoch = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap_or_default()
.as_secs();
let prev_epoch = self.last_health_epoch.swap(now_epoch, Ordering::AcqRel);
// Detect restore via wall-clock gap: if >3s passed since last health
// check, the VM was frozen and restored. Catches the case where
// snapshot/prepare timed out and needs_restore was never set.
let gap_detected = prev_epoch > 0 && now_epoch.saturating_sub(prev_epoch) > 3;
let flag_set = self
.needs_restore
.compare_exchange(true, false, Ordering::AcqRel, Ordering::Relaxed)
.is_ok();
if !flag_set && !gap_detected {
return;
}
if gap_detected && !flag_set {
tracing::info!(
gap_secs = now_epoch.saturating_sub(prev_epoch),
"restore: detected via wall-clock gap (needs_restore was not set)"
);
}
tracing::info!("restore: post-restore recovery");
self.snapshot_in_progress.store(false, Ordering::Release);
self.restore_epoch.store(now_epoch, Ordering::Release);
self.conn_tracker.restore_after_snapshot();
if let Some(ref ps) = self.port_subsystem {
ps.restart();
tracing::info!("restore: port subsystem restarted");
}
}
} }
fn cpu_sampler(state: Arc<AppState>) { fn cpu_sampler(state: Arc<AppState>) {
@ -70,6 +113,15 @@ fn cpu_sampler(state: Arc<AppState>) {
loop { loop {
std::thread::sleep(std::time::Duration::from_secs(1)); std::thread::sleep(std::time::Duration::from_secs(1));
if state.needs_restore.load(Ordering::Acquire) {
// After snapshot restore, sysinfo's internal CPU counters are stale.
// Reinitialize to get a fresh baseline.
sys = System::new();
sys.refresh_cpu_all();
continue;
}
sys.refresh_cpu_all(); sys.refresh_cpu_all();
let pct = sys.global_cpu_usage(); let pct = sys.global_cpu_usage();

379
frontend/bun.lock Normal file
View File

@ -0,0 +1,379 @@
{
"lockfileVersion": 1,
"configVersion": 1,
"workspaces": {
"": {
"name": "frontend",
"dependencies": {
"@xterm/addon-fit": "^0.11.0",
"@xterm/addon-web-links": "^0.12.0",
"@xterm/xterm": "^6.0.0",
"chart.js": "^4.5.1",
"shiki": "^4.0.2",
},
"devDependencies": {
"@fontsource-variable/jetbrains-mono": "^5.2.8",
"@fontsource-variable/manrope": "^5.2.8",
"@fontsource/alice": "^5.2.8",
"@fontsource/instrument-serif": "^5.2.8",
"@sveltejs/adapter-static": "^3.0.10",
"@sveltejs/kit": "^2.50.2",
"@sveltejs/vite-plugin-svelte": "^7.0.0",
"@tailwindcss/vite": "^4.2.1",
"bits-ui": "^2.16.3",
"svelte": "^5.51.0",
"svelte-check": "^4.4.2",
"tailwindcss": "^4.2.1",
"typescript": "^6.0.2",
"vite": "^8.0.8",
},
},
},
"packages": {
"@emnapi/core": ["@emnapi/core@1.9.2", "", { "dependencies": { "@emnapi/wasi-threads": "1.2.1", "tslib": "2.8.1" } }, "sha512-UC+ZhH3XtczQYfOlu3lNEkdW/p4dsJ1r/bP7H8+rhao3TTTMO1ATq/4DdIi23XuGoFY+Cz0JmCbdVl0hz9jZcA=="],
"@emnapi/runtime": ["@emnapi/runtime@1.9.2", "", { "dependencies": { "tslib": "2.8.1" } }, "sha512-3U4+MIWHImeyu1wnmVygh5WlgfYDtyf0k8AbLhMFxOipihf6nrWC4syIm/SwEeec0mNSafiiNnMJwbza/Is6Lw=="],
"@emnapi/wasi-threads": ["@emnapi/wasi-threads@1.2.1", "", { "dependencies": { "tslib": "2.8.1" } }, "sha512-uTII7OYF+/Mes/MrcIOYp5yOtSMLBWSIoLPpcgwipoiKbli6k322tcoFsxoIIxPDqW01SQGAgko4EzZi2BNv2w=="],
"@floating-ui/core": ["@floating-ui/core@1.7.5", "", { "dependencies": { "@floating-ui/utils": "0.2.11" } }, "sha512-1Ih4WTWyw0+lKyFMcBHGbb5U5FtuHJuujoyyr5zTaWS5EYMeT6Jb2AuDeftsCsEuchO+mM2ij5+q9crhydzLhQ=="],
"@floating-ui/dom": ["@floating-ui/dom@1.7.6", "", { "dependencies": { "@floating-ui/core": "1.7.5", "@floating-ui/utils": "0.2.11" } }, "sha512-9gZSAI5XM36880PPMm//9dfiEngYoC6Am2izES1FF406YFsjvyBMmeJ2g4SAju3xWwtuynNRFL2s9hgxpLI5SQ=="],
"@floating-ui/utils": ["@floating-ui/utils@0.2.11", "", {}, "sha512-RiB/yIh78pcIxl6lLMG0CgBXAZ2Y0eVHqMPYugu+9U0AeT6YBeiJpf7lbdJNIugFP5SIjwNRgo4DhR1Qxi26Gg=="],
"@fontsource-variable/jetbrains-mono": ["@fontsource-variable/jetbrains-mono@5.2.8", "", {}, "sha512-WBA9elru6Jdp5df2mES55wuOO0WIrn3kpXnI4+W2ek5u3ZgLS9XS4gmIlcQhiZOWEKl95meYdvK7xI+ETLCq/Q=="],
"@fontsource-variable/manrope": ["@fontsource-variable/manrope@5.2.8", "", {}, "sha512-nc9lOuCRz73UHnovDE2bwXUdghE2SEOc7Aii0qGe3CLyE03W1a7VnY5Z6euRiapiKbCkGS+eXbY3s/kvWeGeSw=="],
"@fontsource/alice": ["@fontsource/alice@5.2.8", "", {}, "sha512-EDpK9aFXsaRKdyZpgFu8d5+zmE07yIaFxqVeKrYQJjdQpEhWDZA+naLflHwQQmMbLMJK3a4X/RAm5MCScT93NA=="],
"@fontsource/instrument-serif": ["@fontsource/instrument-serif@5.2.8", "", {}, "sha512-s+bkz+syj2rO00Rmq9g0P+PwuLig33DR1xDR8pTWmovH1pUjwnncrFk++q9mmOex8fUQ7oW80gPpPDaw7V1MMw=="],
"@internationalized/date": ["@internationalized/date@3.12.0", "", { "dependencies": { "@swc/helpers": "0.5.21" } }, "sha512-/PyIMzK29jtXaGU23qTvNZxvBXRtKbNnGDFD+PY6CZw/Y8Ex8pFUzkuCJCG9aOqmShjqhS9mPqP6Dk5onQY8rQ=="],
"@jridgewell/gen-mapping": ["@jridgewell/gen-mapping@0.3.13", "", { "dependencies": { "@jridgewell/sourcemap-codec": "1.5.5", "@jridgewell/trace-mapping": "0.3.31" } }, "sha512-2kkt/7niJ6MgEPxF0bYdQ6etZaA+fQvDcLKckhy1yIQOzaoKjBBjSj63/aLVjYE3qhRt5dvM+uUyfCg6UKCBbA=="],
"@jridgewell/remapping": ["@jridgewell/remapping@2.3.5", "", { "dependencies": { "@jridgewell/gen-mapping": "0.3.13", "@jridgewell/trace-mapping": "0.3.31" } }, "sha512-LI9u/+laYG4Ds1TDKSJW2YPrIlcVYOwi2fUC6xB43lueCjgxV4lffOCZCtYFiH6TNOX+tQKXx97T4IKHbhyHEQ=="],
"@jridgewell/resolve-uri": ["@jridgewell/resolve-uri@3.1.2", "", {}, "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw=="],
"@jridgewell/sourcemap-codec": ["@jridgewell/sourcemap-codec@1.5.5", "", {}, "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og=="],
"@jridgewell/trace-mapping": ["@jridgewell/trace-mapping@0.3.31", "", { "dependencies": { "@jridgewell/resolve-uri": "3.1.2", "@jridgewell/sourcemap-codec": "1.5.5" } }, "sha512-zzNR+SdQSDJzc8joaeP8QQoCQr8NuYx2dIIytl1QeBEZHJ9uW6hebsrYgbz8hJwUQao3TWCMtmfV8Nu1twOLAw=="],
"@kurkle/color": ["@kurkle/color@0.3.4", "", {}, "sha512-M5UknZPHRu3DEDWoipU6sE8PdkZ6Z/S+v4dD+Ke8IaNlpdSQah50lz1KtcFBa2vsdOnwbbnxJwVM4wty6udA5w=="],
"@napi-rs/wasm-runtime": ["@napi-rs/wasm-runtime@1.1.3", "", { "dependencies": { "@tybys/wasm-util": "0.10.1" }, "peerDependencies": { "@emnapi/core": "1.9.2", "@emnapi/runtime": "1.9.2" } }, "sha512-xK9sGVbJWYb08+mTJt3/YV24WxvxpXcXtP6B172paPZ+Ts69Re9dAr7lKwJoeIx8OoeuimEiRZ7umkiUVClmmQ=="],
"@oxc-project/types": ["@oxc-project/types@0.124.0", "", {}, "sha512-VBFWMTBvHxS11Z5Lvlr3IWgrwhMTXV+Md+EQF0Xf60+wAdsGFTBx7X7K/hP4pi8N7dcm1RvcHwDxZ16Qx8keUg=="],
"@polka/url": ["@polka/url@1.0.0-next.29", "", {}, "sha512-wwQAWhWSuHaag8c4q/KN/vCoeOJYshAIvMQwD4GpSb3OiZklFfvAgmj0VCBBImRpuF/aFgIRzllXlVX93Jevww=="],
"@rolldown/binding-android-arm64": ["@rolldown/binding-android-arm64@1.0.0-rc.15", "", { "os": "android", "cpu": "arm64" }, "sha512-YYe6aWruPZDtHNpwu7+qAHEMbQ/yRl6atqb/AhznLTnD3UY99Q1jE7ihLSahNWkF4EqRPVC4SiR4O0UkLK02tA=="],
"@rolldown/binding-darwin-arm64": ["@rolldown/binding-darwin-arm64@1.0.0-rc.15", "", { "os": "darwin", "cpu": "arm64" }, "sha512-oArR/ig8wNTPYsXL+Mzhs0oxhxfuHRfG7Ikw7jXsw8mYOtk71W0OkF2VEVh699pdmzjPQsTjlD1JIOoHkLP1Fg=="],
"@rolldown/binding-darwin-x64": ["@rolldown/binding-darwin-x64@1.0.0-rc.15", "", { "os": "darwin", "cpu": "x64" }, "sha512-YzeVqOqjPYvUbJSWJ4EDL8ahbmsIXQpgL3JVipmN+MX0XnXMeWomLN3Fb+nwCmP/jfyqte5I3XRSm7OfQrbyxw=="],
"@rolldown/binding-freebsd-x64": ["@rolldown/binding-freebsd-x64@1.0.0-rc.15", "", { "os": "freebsd", "cpu": "x64" }, "sha512-9Erhx956jeQ0nNTyif1+QWAXDRD38ZNjr//bSHrt6wDwB+QkAfl2q6Mn1k6OBPerznjRmbM10lgRb1Pli4xZPw=="],
"@rolldown/binding-linux-arm-gnueabihf": ["@rolldown/binding-linux-arm-gnueabihf@1.0.0-rc.15", "", { "os": "linux", "cpu": "arm" }, "sha512-cVwk0w8QbZJGTnP/AHQBs5yNwmpgGYStL88t4UIaqcvYJWBfS0s3oqVLZPwsPU6M0zlW4GqjP0Zq5MnAGwFeGA=="],
"@rolldown/binding-linux-arm64-gnu": ["@rolldown/binding-linux-arm64-gnu@1.0.0-rc.15", "", { "os": "linux", "cpu": "arm64" }, "sha512-eBZ/u8iAK9SoHGanqe/jrPnY0JvBN6iXbVOsbO38mbz+ZJsaobExAm1Iu+rxa4S1l2FjG0qEZn4Rc6X8n+9M+w=="],
"@rolldown/binding-linux-arm64-musl": ["@rolldown/binding-linux-arm64-musl@1.0.0-rc.15", "", { "os": "linux", "cpu": "arm64" }, "sha512-ZvRYMGrAklV9PEkgt4LQM6MjQX2P58HPAuecwYObY2DhS2t35R0I810bKi0wmaYORt6m/2Sm+Z+nFgb0WhXNcQ=="],
"@rolldown/binding-linux-ppc64-gnu": ["@rolldown/binding-linux-ppc64-gnu@1.0.0-rc.15", "", { "os": "linux", "cpu": "ppc64" }, "sha512-VDpgGBzgfg5hLg+uBpCLoFG5kVvEyafmfxGUV0UHLcL5irxAK7PKNeC2MwClgk6ZAiNhmo9FLhRYgvMmedLtnQ=="],
"@rolldown/binding-linux-s390x-gnu": ["@rolldown/binding-linux-s390x-gnu@1.0.0-rc.15", "", { "os": "linux", "cpu": "s390x" }, "sha512-y1uXY3qQWCzcPgRJATPSOUP4tCemh4uBdY7e3EZbVwCJTY3gLJWnQABgeUetvED+bt1FQ01OeZwvhLS2bpNrAQ=="],
"@rolldown/binding-linux-x64-gnu": ["@rolldown/binding-linux-x64-gnu@1.0.0-rc.15", "", { "os": "linux", "cpu": "x64" }, "sha512-023bTPBod7J3Y/4fzAN6QtpkSABR0rigtrwaP+qSEabUh5zf6ELr9Nc7GujaROuPY3uwdSIXWrvhn1KxOvurWA=="],
"@rolldown/binding-linux-x64-musl": ["@rolldown/binding-linux-x64-musl@1.0.0-rc.15", "", { "os": "linux", "cpu": "x64" }, "sha512-witB2O0/hU4CgfOOKUoeFgQ4GktPi1eEbAhaLAIpgD6+ZnhcPkUtPsoKKHRzmOoWPZue46IThdSgdo4XneOLYw=="],
"@rolldown/binding-openharmony-arm64": ["@rolldown/binding-openharmony-arm64@1.0.0-rc.15", "", { "os": "none", "cpu": "arm64" }, "sha512-UCL68NJ0Ud5zRipXZE9dF5PmirzJE4E4BCIOOssEnM7wLDsxjc6Qb0sGDxTNRTP53I6MZpygyCpY8Aa8sPfKPg=="],
"@rolldown/binding-wasm32-wasi": ["@rolldown/binding-wasm32-wasi@1.0.0-rc.15", "", { "dependencies": { "@emnapi/core": "1.9.2", "@emnapi/runtime": "1.9.2", "@napi-rs/wasm-runtime": "1.1.3" }, "cpu": "none" }, "sha512-ApLruZq/ig+nhaE7OJm4lDjayUnOHVUa77zGeqnqZ9pn0ovdVbbNPerVibLXDmWeUZXjIYIT8V3xkT58Rm9u5Q=="],
"@rolldown/binding-win32-arm64-msvc": ["@rolldown/binding-win32-arm64-msvc@1.0.0-rc.15", "", { "os": "win32", "cpu": "arm64" }, "sha512-KmoUoU7HnN+Si5YWJigfTws1jz1bKBYDQKdbLspz0UaqjjFkddHsqorgiW1mxcAj88lYUE6NC/zJNwT+SloqtA=="],
"@rolldown/binding-win32-x64-msvc": ["@rolldown/binding-win32-x64-msvc@1.0.0-rc.15", "", { "os": "win32", "cpu": "x64" }, "sha512-3P2A8L+x75qavWLe/Dll3EYBJLQmtkJN8rfh+U/eR3MqMgL/h98PhYI+JFfXuDPgPeCB7iZAKiqii5vqOvnA0g=="],
"@rolldown/pluginutils": ["@rolldown/pluginutils@1.0.0-rc.15", "", {}, "sha512-UromN0peaE53IaBRe9W7CjrZgXl90fqGpK+mIZbA3qSTeYqg3pqpROBdIPvOG3F5ereDHNwoHBI2e50n1BDr1g=="],
"@shikijs/core": ["@shikijs/core@4.0.2", "", { "dependencies": { "@shikijs/primitive": "4.0.2", "@shikijs/types": "4.0.2", "@shikijs/vscode-textmate": "10.0.2", "@types/hast": "3.0.4", "hast-util-to-html": "9.0.5" } }, "sha512-hxT0YF4ExEqB8G/qFdtJvpmHXBYJ2lWW7qTHDarVkIudPFE6iCIrqdgWxGn5s+ppkGXI0aEGlibI0PAyzP3zlw=="],
"@shikijs/engine-javascript": ["@shikijs/engine-javascript@4.0.2", "", { "dependencies": { "@shikijs/types": "4.0.2", "@shikijs/vscode-textmate": "10.0.2", "oniguruma-to-es": "4.3.5" } }, "sha512-7PW0Nm49DcoUIQEXlJhNNBHyoGMjalRETTCcjMqEaMoJRLljy1Bi/EGV3/qLBgLKQejdspiiYuHGQW6dX94Nag=="],
"@shikijs/engine-oniguruma": ["@shikijs/engine-oniguruma@4.0.2", "", { "dependencies": { "@shikijs/types": "4.0.2", "@shikijs/vscode-textmate": "10.0.2" } }, "sha512-UpCB9Y2sUKlS9z8juFSKz7ZtysmeXCgnRF0dlhXBkmQnek7lAToPte8DkxmEYGNTMii72zU/lyXiCB6StuZeJg=="],
"@shikijs/langs": ["@shikijs/langs@4.0.2", "", { "dependencies": { "@shikijs/types": "4.0.2" } }, "sha512-KaXby5dvoeuZzN0rYQiPMjFoUrz4hgwIE+D6Du9owcHcl6/g16/yT5BQxSW5cGt2MZBz6Hl0YuRqf12omRfUUg=="],
"@shikijs/primitive": ["@shikijs/primitive@4.0.2", "", { "dependencies": { "@shikijs/types": "4.0.2", "@shikijs/vscode-textmate": "10.0.2", "@types/hast": "3.0.4" } }, "sha512-M6UMPrSa3fN5ayeJwFVl9qWofl273wtK1VG8ySDZ1mQBfhCpdd8nEx7nPZ/tk7k+TYcpqBZzj/AnwxT9lO+HJw=="],
"@shikijs/themes": ["@shikijs/themes@4.0.2", "", { "dependencies": { "@shikijs/types": "4.0.2" } }, "sha512-mjCafwt8lJJaVSsQvNVrJumbnnj1RI8jbUKrPKgE6E3OvQKxnuRoBaYC51H4IGHePsGN/QtALglWBU7DoKDFnA=="],
"@shikijs/types": ["@shikijs/types@4.0.2", "", { "dependencies": { "@shikijs/vscode-textmate": "10.0.2", "@types/hast": "3.0.4" } }, "sha512-qzbeRooUTPnLE+sHD/Z8DStmaDgnbbc/pMrU203950aRqjX/6AFHeDYT+j00y2lPdz0ywJKx7o/7qnqTivtlXg=="],
"@shikijs/vscode-textmate": ["@shikijs/vscode-textmate@10.0.2", "", {}, "sha512-83yeghZ2xxin3Nj8z1NMd/NCuca+gsYXswywDy5bHvwlWL8tpTQmzGeUuHd9FC3E/SBEMvzJRwWEOz5gGes9Qg=="],
"@standard-schema/spec": ["@standard-schema/spec@1.1.0", "", {}, "sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w=="],
"@sveltejs/acorn-typescript": ["@sveltejs/acorn-typescript@1.0.9", "", { "peerDependencies": { "acorn": "8.16.0" } }, "sha512-lVJX6qEgs/4DOcRTpo56tmKzVPtoWAaVbL4hfO7t7NVwl9AAXzQR6cihesW1BmNMPl+bK6dreu2sOKBP2Q9CIA=="],
"@sveltejs/adapter-static": ["@sveltejs/adapter-static@3.0.10", "", { "peerDependencies": { "@sveltejs/kit": "2.57.1" } }, "sha512-7D9lYFWJmB7zxZyTE/qxjksvMqzMuYrrsyh1f4AlZqeZeACPRySjbC3aFiY55wb1tWUaKOQG9PVbm74JcN2Iew=="],
"@sveltejs/kit": ["@sveltejs/kit@2.57.1", "", { "dependencies": { "@standard-schema/spec": "1.1.0", "@sveltejs/acorn-typescript": "1.0.9", "@types/cookie": "0.6.0", "acorn": "8.16.0", "cookie": "0.6.0", "devalue": "5.7.1", "esm-env": "1.2.2", "kleur": "4.1.5", "magic-string": "0.30.21", "mrmime": "2.0.1", "set-cookie-parser": "3.1.0", "sirv": "3.0.2" }, "optionalDependencies": { "typescript": "6.0.2" }, "peerDependencies": { "@sveltejs/vite-plugin-svelte": "7.0.0", "svelte": "5.55.3", "vite": "8.0.8" }, "bin": { "svelte-kit": "svelte-kit.js" } }, "sha512-VRdSbB96cI1EnRh09CqmnQqP/YJvET5buj8S6k7CxaJqBJD4bw4fRKDjcarAj/eX9k2eHifQfDH8NtOh+ZxxPw=="],
"@sveltejs/vite-plugin-svelte": ["@sveltejs/vite-plugin-svelte@7.0.0", "", { "dependencies": { "deepmerge": "4.3.1", "magic-string": "0.30.21", "obug": "2.1.1", "vitefu": "1.1.3" }, "peerDependencies": { "svelte": "5.55.3", "vite": "8.0.8" } }, "sha512-ILXmxC7HAsnkK2eslgPetrqqW1BKSL7LktsFgqzNj83MaivMGZzluWq32m25j2mDOjmSKX7GGWahePhuEs7P/g=="],
"@swc/helpers": ["@swc/helpers@0.5.21", "", { "dependencies": { "tslib": "2.8.1" } }, "sha512-jI/VAmtdjB/RnI8GTnokyX7Ug8c+g+ffD6QRLa6XQewtnGyukKkKSk3wLTM3b5cjt1jNh9x0jfVlagdN2gDKQg=="],
"@tailwindcss/node": ["@tailwindcss/node@4.2.2", "", { "dependencies": { "@jridgewell/remapping": "2.3.5", "enhanced-resolve": "5.20.1", "jiti": "2.6.1", "lightningcss": "1.32.0", "magic-string": "0.30.21", "source-map-js": "1.2.1", "tailwindcss": "4.2.2" } }, "sha512-pXS+wJ2gZpVXqFaUEjojq7jzMpTGf8rU6ipJz5ovJV6PUGmlJ+jvIwGrzdHdQ80Sg+wmQxUFuoW1UAAwHNEdFA=="],
"@tailwindcss/oxide": ["@tailwindcss/oxide@4.2.2", "", { "optionalDependencies": { "@tailwindcss/oxide-android-arm64": "4.2.2", "@tailwindcss/oxide-darwin-arm64": "4.2.2", "@tailwindcss/oxide-darwin-x64": "4.2.2", "@tailwindcss/oxide-freebsd-x64": "4.2.2", "@tailwindcss/oxide-linux-arm-gnueabihf": "4.2.2", "@tailwindcss/oxide-linux-arm64-gnu": "4.2.2", "@tailwindcss/oxide-linux-arm64-musl": "4.2.2", "@tailwindcss/oxide-linux-x64-gnu": "4.2.2", "@tailwindcss/oxide-linux-x64-musl": "4.2.2", "@tailwindcss/oxide-wasm32-wasi": "4.2.2", "@tailwindcss/oxide-win32-arm64-msvc": "4.2.2", "@tailwindcss/oxide-win32-x64-msvc": "4.2.2" } }, "sha512-qEUA07+E5kehxYp9BVMpq9E8vnJuBHfJEC0vPC5e7iL/hw7HR61aDKoVoKzrG+QKp56vhNZe4qwkRmMC0zDLvg=="],
"@tailwindcss/oxide-android-arm64": ["@tailwindcss/oxide-android-arm64@4.2.2", "", { "os": "android", "cpu": "arm64" }, "sha512-dXGR1n+P3B6748jZO/SvHZq7qBOqqzQ+yFrXpoOWWALWndF9MoSKAT3Q0fYgAzYzGhxNYOoysRvYlpixRBBoDg=="],
"@tailwindcss/oxide-darwin-arm64": ["@tailwindcss/oxide-darwin-arm64@4.2.2", "", { "os": "darwin", "cpu": "arm64" }, "sha512-iq9Qjr6knfMpZHj55/37ouZeykwbDqF21gPFtfnhCCKGDcPI/21FKC9XdMO/XyBM7qKORx6UIhGgg6jLl7BZlg=="],
"@tailwindcss/oxide-darwin-x64": ["@tailwindcss/oxide-darwin-x64@4.2.2", "", { "os": "darwin", "cpu": "x64" }, "sha512-BlR+2c3nzc8f2G639LpL89YY4bdcIdUmiOOkv2GQv4/4M0vJlpXEa0JXNHhCHU7VWOKWT/CjqHdTP8aUuDJkuw=="],
"@tailwindcss/oxide-freebsd-x64": ["@tailwindcss/oxide-freebsd-x64@4.2.2", "", { "os": "freebsd", "cpu": "x64" }, "sha512-YUqUgrGMSu2CDO82hzlQ5qSb5xmx3RUrke/QgnoEx7KvmRJHQuZHZmZTLSuuHwFf0DJPybFMXMYf+WJdxHy/nQ=="],
"@tailwindcss/oxide-linux-arm-gnueabihf": ["@tailwindcss/oxide-linux-arm-gnueabihf@4.2.2", "", { "os": "linux", "cpu": "arm" }, "sha512-FPdhvsW6g06T9BWT0qTwiVZYE2WIFo2dY5aCSpjG/S/u1tby+wXoslXS0kl3/KXnULlLr1E3NPRRw0g7t2kgaQ=="],
"@tailwindcss/oxide-linux-arm64-gnu": ["@tailwindcss/oxide-linux-arm64-gnu@4.2.2", "", { "os": "linux", "cpu": "arm64" }, "sha512-4og1V+ftEPXGttOO7eCmW7VICmzzJWgMx+QXAJRAhjrSjumCwWqMfkDrNu1LXEQzNAwz28NCUpucgQPrR4S2yw=="],
"@tailwindcss/oxide-linux-arm64-musl": ["@tailwindcss/oxide-linux-arm64-musl@4.2.2", "", { "os": "linux", "cpu": "arm64" }, "sha512-oCfG/mS+/+XRlwNjnsNLVwnMWYH7tn/kYPsNPh+JSOMlnt93mYNCKHYzylRhI51X+TbR+ufNhhKKzm6QkqX8ag=="],
"@tailwindcss/oxide-linux-x64-gnu": ["@tailwindcss/oxide-linux-x64-gnu@4.2.2", "", { "os": "linux", "cpu": "x64" }, "sha512-rTAGAkDgqbXHNp/xW0iugLVmX62wOp2PoE39BTCGKjv3Iocf6AFbRP/wZT/kuCxC9QBh9Pu8XPkv/zCZB2mcMg=="],
"@tailwindcss/oxide-linux-x64-musl": ["@tailwindcss/oxide-linux-x64-musl@4.2.2", "", { "os": "linux", "cpu": "x64" }, "sha512-XW3t3qwbIwiSyRCggeO2zxe3KWaEbM0/kW9e8+0XpBgyKU4ATYzcVSMKteZJ1iukJ3HgHBjbg9P5YPRCVUxlnQ=="],
"@tailwindcss/oxide-wasm32-wasi": ["@tailwindcss/oxide-wasm32-wasi@4.2.2", "", { "cpu": "none" }, "sha512-eKSztKsmEsn1O5lJ4ZAfyn41NfG7vzCg496YiGtMDV86jz1q/irhms5O0VrY6ZwTUkFy/EKG3RfWgxSI3VbZ8Q=="],
"@tailwindcss/oxide-win32-arm64-msvc": ["@tailwindcss/oxide-win32-arm64-msvc@4.2.2", "", { "os": "win32", "cpu": "arm64" }, "sha512-qPmaQM4iKu5mxpsrWZMOZRgZv1tOZpUm+zdhhQP0VhJfyGGO3aUKdbh3gDZc/dPLQwW4eSqWGrrcWNBZWUWaXQ=="],
"@tailwindcss/oxide-win32-x64-msvc": ["@tailwindcss/oxide-win32-x64-msvc@4.2.2", "", { "os": "win32", "cpu": "x64" }, "sha512-1T/37VvI7WyH66b+vqHj/cLwnCxt7Qt3WFu5Q8hk65aOvlwAhs7rAp1VkulBJw/N4tMirXjVnylTR72uI0HGcA=="],
"@tailwindcss/vite": ["@tailwindcss/vite@4.2.2", "", { "dependencies": { "@tailwindcss/node": "4.2.2", "@tailwindcss/oxide": "4.2.2", "tailwindcss": "4.2.2" }, "peerDependencies": { "vite": "8.0.8" } }, "sha512-mEiF5HO1QqCLXoNEfXVA1Tzo+cYsrqV7w9Juj2wdUFyW07JRenqMG225MvPwr3ZD9N1bFQj46X7r33iHxLUW0w=="],
"@tybys/wasm-util": ["@tybys/wasm-util@0.10.1", "", { "dependencies": { "tslib": "2.8.1" } }, "sha512-9tTaPJLSiejZKx+Bmog4uSubteqTvFrVrURwkmHixBo0G4seD0zUxp98E1DzUBJxLQ3NPwXrGKDiVjwx/DpPsg=="],
"@types/cookie": ["@types/cookie@0.6.0", "", {}, "sha512-4Kh9a6B2bQciAhf7FSuMRRkUWecJgJu9nPnx3yzpsfXX/c50REIqpHY4C82bXP90qrLtXtkDxTZosYO3UpOwlA=="],
"@types/estree": ["@types/estree@1.0.8", "", {}, "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w=="],
"@types/hast": ["@types/hast@3.0.4", "", { "dependencies": { "@types/unist": "3.0.3" } }, "sha512-WPs+bbQw5aCj+x6laNGWLH3wviHtoCv/P3+otBhbOhJgG8qtpdAMlTCxLtsTWA7LH1Oh/bFCHsBn0TPS5m30EQ=="],
"@types/mdast": ["@types/mdast@4.0.4", "", { "dependencies": { "@types/unist": "3.0.3" } }, "sha512-kGaNbPh1k7AFzgpud/gMdvIm5xuECykRR+JnWKQno9TAXVa6WIVCGTPvYGekIDL4uwCZQSYbUxNBSb1aUo79oA=="],
"@types/trusted-types": ["@types/trusted-types@2.0.7", "", {}, "sha512-ScaPdn1dQczgbl0QFTeTOmVHFULt394XJgOQNoyVhZ6r2vLnMLJfBPd53SB52T/3G36VI1/g2MZaX0cwDuXsfw=="],
"@types/unist": ["@types/unist@3.0.3", "", {}, "sha512-ko/gIFJRv177XgZsZcBwnqJN5x/Gien8qNOn0D5bQU/zAzVf9Zt3BlcUiLqhV9y4ARk0GbT3tnUiPNgnTXzc/Q=="],
"@ungap/structured-clone": ["@ungap/structured-clone@1.3.0", "", {}, "sha512-WmoN8qaIAo7WTYWbAZuG8PYEhn5fkz7dZrqTBZ7dtt//lL2Gwms1IcnQ5yHqjDfX8Ft5j4YzDM23f87zBfDe9g=="],
"@xterm/addon-fit": ["@xterm/addon-fit@0.11.0", "", {}, "sha512-jYcgT6xtVYhnhgxh3QgYDnnNMYTcf8ElbxxFzX0IZo+vabQqSPAjC3c1wJrKB5E19VwQei89QCiZZP86DCPF7g=="],
"@xterm/addon-web-links": ["@xterm/addon-web-links@0.12.0", "", {}, "sha512-4Smom3RPyVp7ZMYOYDoC/9eGJJJqYhnPLGGqJ6wOBfB8VxPViJNSKdgRYb8NpaM6YSelEKbA2SStD7lGyqaobw=="],
"@xterm/xterm": ["@xterm/xterm@6.0.0", "", {}, "sha512-TQwDdQGtwwDt+2cgKDLn0IRaSxYu1tSUjgKarSDkUM0ZNiSRXFpjxEsvc/Zgc5kq5omJ+V0a8/kIM2WD3sMOYg=="],
"acorn": ["acorn@8.16.0", "", { "bin": { "acorn": "bin/acorn" } }, "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw=="],
"aria-query": ["aria-query@5.3.1", "", {}, "sha512-Z/ZeOgVl7bcSYZ/u/rh0fOpvEpq//LZmdbkXyc7syVzjPAhfOa9ebsdTSjEBDU4vs5nC98Kfduj1uFo0qyET3g=="],
"axobject-query": ["axobject-query@4.1.0", "", {}, "sha512-qIj0G9wZbMGNLjLmg1PT6v2mE9AH2zlnADJD/2tC6E00hgmhUOfEB6greHPAfLRSufHqROIUTkw6E+M3lH0PTQ=="],
"bits-ui": ["bits-ui@2.17.3", "", { "dependencies": { "@floating-ui/core": "1.7.5", "@floating-ui/dom": "1.7.6", "esm-env": "1.2.2", "runed": "0.35.1", "svelte-toolbelt": "0.10.6", "tabbable": "6.4.0" }, "peerDependencies": { "@internationalized/date": "3.12.0", "svelte": "5.55.3" } }, "sha512-Bef41uY9U2jaBJHPhcPvmBNkGec5Wx2z6eioDsTmsaR2vH4QoaOcPi75gzCG3+/2TNr6v/qBwzgWNPYCxNtrEA=="],
"ccount": ["ccount@2.0.1", "", {}, "sha512-eyrF0jiFpY+3drT6383f1qhkbGsLSifNAjA61IUjZjmLCWjItY6LB9ft9YhoDgwfmclB2zhu51Lc7+95b8NRAg=="],
"character-entities-html4": ["character-entities-html4@2.1.0", "", {}, "sha512-1v7fgQRj6hnSwFpq1Eu0ynr/CDEw0rXo2B61qXrLNdHZmPKgb7fqS1a2JwF0rISo9q77jDI8VMEHoApn8qDoZA=="],
"character-entities-legacy": ["character-entities-legacy@3.0.0", "", {}, "sha512-RpPp0asT/6ufRm//AJVwpViZbGM/MkjQFxJccQRHmISF/22NBtsHqAWmL+/pmkPWoIUJdWyeVleTl1wydHATVQ=="],
"chart.js": ["chart.js@4.5.1", "", { "dependencies": { "@kurkle/color": "0.3.4" } }, "sha512-GIjfiT9dbmHRiYi6Nl2yFCq7kkwdkp1W/lp2J99rX0yo9tgJGn3lKQATztIjb5tVtevcBtIdICNWqlq5+E8/Pw=="],
"chokidar": ["chokidar@4.0.3", "", { "dependencies": { "readdirp": "4.1.2" } }, "sha512-Qgzu8kfBvo+cA4962jnP1KkS6Dop5NS6g7R5LFYJr4b8Ub94PPQXUksCw9PvXoeXPRRddRNC5C1JQUR2SMGtnA=="],
"clsx": ["clsx@2.1.1", "", {}, "sha512-eYm0QWBtUrBWZWG0d386OGAw16Z995PiOVo2B7bjWSbHedGl5e0ZWaq65kOGgUSNesEIDkB9ISbTg/JK9dhCZA=="],
"comma-separated-tokens": ["comma-separated-tokens@2.0.3", "", {}, "sha512-Fu4hJdvzeylCfQPp9SGWidpzrMs7tTrlu6Vb8XGaRGck8QSNZJJp538Wrb60Lax4fPwR64ViY468OIUTbRlGZg=="],
"cookie": ["cookie@0.6.0", "", {}, "sha512-U71cyTamuh1CRNCfpGY6to28lxvNwPG4Guz/EVjgf3Jmzv0vlDp1atT9eS5dDjMYHucpHbWns6Lwf3BKz6svdw=="],
"deepmerge": ["deepmerge@4.3.1", "", {}, "sha512-3sUqbMEc77XqpdNO7FRyRog+eW3ph+GYCbj+rK+uYyRMuwsVy0rMiVtPn+QJlKFvWP/1PYpapqYn0Me2knFn+A=="],
"dequal": ["dequal@2.0.3", "", {}, "sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA=="],
"detect-libc": ["detect-libc@2.1.2", "", {}, "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ=="],
"devalue": ["devalue@5.7.1", "", {}, "sha512-MUbZ586EgQqdRnC4yDrlod3BEdyvE4TapGYHMW2CiaW+KkkFmWEFqBUaLltEZCGi0iFXCEjRF0OjF0DV2QHjOA=="],
"devlop": ["devlop@1.1.0", "", { "dependencies": { "dequal": "2.0.3" } }, "sha512-RWmIqhcFf1lRYBvNmr7qTNuyCt/7/ns2jbpp1+PalgE/rDQcBT0fioSMUpJ93irlUhC5hrg4cYqe6U+0ImW0rA=="],
"enhanced-resolve": ["enhanced-resolve@5.20.1", "", { "dependencies": { "graceful-fs": "4.2.11", "tapable": "2.3.2" } }, "sha512-Qohcme7V1inbAfvjItgw0EaxVX5q2rdVEZHRBrEQdRZTssLDGsL8Lwrznl8oQ/6kuTJONLaDcGjkNP247XEhcA=="],
"esm-env": ["esm-env@1.2.2", "", {}, "sha512-Epxrv+Nr/CaL4ZcFGPJIYLWFom+YeV1DqMLHJoEd9SYRxNbaFruBwfEX/kkHUJf55j2+TUbmDcmuilbP1TmXHA=="],
"esrap": ["esrap@2.2.5", "", { "dependencies": { "@jridgewell/sourcemap-codec": "1.5.5" } }, "sha512-/yLB1538mag+dn0wsePTe8C0rDIjUOaJpMs2McodSzmM2msWcZsBSdRtg6HOBt0A/r82BN+Md3pgwSc/uWt2Ig=="],
"fdir": ["fdir@6.5.0", "", { "optionalDependencies": { "picomatch": "4.0.4" } }, "sha512-tIbYtZbucOs0BRGqPJkshJUYdL+SDH7dVM8gjy+ERp3WAUjLEFJE+02kanyHtwjWOnwrKYBiwAmM0p4kLJAnXg=="],
"fsevents": ["fsevents@2.3.3", "", { "os": "darwin" }, "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw=="],
"graceful-fs": ["graceful-fs@4.2.11", "", {}, "sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ=="],
"hast-util-to-html": ["hast-util-to-html@9.0.5", "", { "dependencies": { "@types/hast": "3.0.4", "@types/unist": "3.0.3", "ccount": "2.0.1", "comma-separated-tokens": "2.0.3", "hast-util-whitespace": "3.0.0", "html-void-elements": "3.0.0", "mdast-util-to-hast": "13.2.1", "property-information": "7.1.0", "space-separated-tokens": "2.0.2", "stringify-entities": "4.0.4", "zwitch": "2.0.4" } }, "sha512-OguPdidb+fbHQSU4Q4ZiLKnzWo8Wwsf5bZfbvu7//a9oTYoqD/fWpe96NuHkoS9h0ccGOTe0C4NGXdtS0iObOw=="],
"hast-util-whitespace": ["hast-util-whitespace@3.0.0", "", { "dependencies": { "@types/hast": "3.0.4" } }, "sha512-88JUN06ipLwsnv+dVn+OIYOvAuvBMy/Qoi6O7mQHxdPXpjy+Cd6xRkWwux7DKO+4sYILtLBRIKgsdpS2gQc7qw=="],
"html-void-elements": ["html-void-elements@3.0.0", "", {}, "sha512-bEqo66MRXsUGxWHV5IP0PUiAWwoEjba4VCzg0LjFJBpchPaTfyfCKTG6bc5F8ucKec3q5y6qOdGyYTSBEvhCrg=="],
"inline-style-parser": ["inline-style-parser@0.2.7", "", {}, "sha512-Nb2ctOyNR8DqQoR0OwRG95uNWIC0C1lCgf5Naz5H6Ji72KZ8OcFZLz2P5sNgwlyoJ8Yif11oMuYs5pBQa86csA=="],
"is-reference": ["is-reference@3.0.3", "", { "dependencies": { "@types/estree": "1.0.8" } }, "sha512-ixkJoqQvAP88E6wLydLGGqCJsrFUnqoH6HnaczB8XmDH1oaWU+xxdptvikTgaEhtZ53Ky6YXiBuUI2WXLMCwjw=="],
"jiti": ["jiti@2.6.1", "", { "bin": { "jiti": "lib/jiti-cli.mjs" } }, "sha512-ekilCSN1jwRvIbgeg/57YFh8qQDNbwDb9xT/qu2DAHbFFZUicIl4ygVaAvzveMhMVr3LnpSKTNnwt8PoOfmKhQ=="],
"kleur": ["kleur@4.1.5", "", {}, "sha512-o+NO+8WrRiQEE4/7nwRJhN1HWpVmJm511pBHUxPLtp0BUISzlBplORYSmTclCnJvQq2tKu/sgl3xVpkc7ZWuQQ=="],
"lightningcss": ["lightningcss@1.32.0", "", { "dependencies": { "detect-libc": "2.1.2" }, "optionalDependencies": { "lightningcss-android-arm64": "1.32.0", "lightningcss-darwin-arm64": "1.32.0", "lightningcss-darwin-x64": "1.32.0", "lightningcss-freebsd-x64": "1.32.0", "lightningcss-linux-arm-gnueabihf": "1.32.0", "lightningcss-linux-arm64-gnu": "1.32.0", "lightningcss-linux-arm64-musl": "1.32.0", "lightningcss-linux-x64-gnu": "1.32.0", "lightningcss-linux-x64-musl": "1.32.0", "lightningcss-win32-arm64-msvc": "1.32.0", "lightningcss-win32-x64-msvc": "1.32.0" } }, "sha512-NXYBzinNrblfraPGyrbPoD19C1h9lfI/1mzgWYvXUTe414Gz/X1FD2XBZSZM7rRTrMA8JL3OtAaGifrIKhQ5yQ=="],
"lightningcss-android-arm64": ["lightningcss-android-arm64@1.32.0", "", { "os": "android", "cpu": "arm64" }, "sha512-YK7/ClTt4kAK0vo6w3X+Pnm0D2cf2vPHbhOXdoNti1Ga0al1P4TBZhwjATvjNwLEBCnKvjJc2jQgHXH0NEwlAg=="],
"lightningcss-darwin-arm64": ["lightningcss-darwin-arm64@1.32.0", "", { "os": "darwin", "cpu": "arm64" }, "sha512-RzeG9Ju5bag2Bv1/lwlVJvBE3q6TtXskdZLLCyfg5pt+HLz9BqlICO7LZM7VHNTTn/5PRhHFBSjk5lc4cmscPQ=="],
"lightningcss-darwin-x64": ["lightningcss-darwin-x64@1.32.0", "", { "os": "darwin", "cpu": "x64" }, "sha512-U+QsBp2m/s2wqpUYT/6wnlagdZbtZdndSmut/NJqlCcMLTWp5muCrID+K5UJ6jqD2BFshejCYXniPDbNh73V8w=="],
"lightningcss-freebsd-x64": ["lightningcss-freebsd-x64@1.32.0", "", { "os": "freebsd", "cpu": "x64" }, "sha512-JCTigedEksZk3tHTTthnMdVfGf61Fky8Ji2E4YjUTEQX14xiy/lTzXnu1vwiZe3bYe0q+SpsSH/CTeDXK6WHig=="],
"lightningcss-linux-arm-gnueabihf": ["lightningcss-linux-arm-gnueabihf@1.32.0", "", { "os": "linux", "cpu": "arm" }, "sha512-x6rnnpRa2GL0zQOkt6rts3YDPzduLpWvwAF6EMhXFVZXD4tPrBkEFqzGowzCsIWsPjqSK+tyNEODUBXeeVHSkw=="],
"lightningcss-linux-arm64-gnu": ["lightningcss-linux-arm64-gnu@1.32.0", "", { "os": "linux", "cpu": "arm64" }, "sha512-0nnMyoyOLRJXfbMOilaSRcLH3Jw5z9HDNGfT/gwCPgaDjnx0i8w7vBzFLFR1f6CMLKF8gVbebmkUN3fa/kQJpQ=="],
"lightningcss-linux-arm64-musl": ["lightningcss-linux-arm64-musl@1.32.0", "", { "os": "linux", "cpu": "arm64" }, "sha512-UpQkoenr4UJEzgVIYpI80lDFvRmPVg6oqboNHfoH4CQIfNA+HOrZ7Mo7KZP02dC6LjghPQJeBsvXhJod/wnIBg=="],
"lightningcss-linux-x64-gnu": ["lightningcss-linux-x64-gnu@1.32.0", "", { "os": "linux", "cpu": "x64" }, "sha512-V7Qr52IhZmdKPVr+Vtw8o+WLsQJYCTd8loIfpDaMRWGUZfBOYEJeyJIkqGIDMZPwPx24pUMfwSxxI8phr/MbOA=="],
"lightningcss-linux-x64-musl": ["lightningcss-linux-x64-musl@1.32.0", "", { "os": "linux", "cpu": "x64" }, "sha512-bYcLp+Vb0awsiXg/80uCRezCYHNg1/l3mt0gzHnWV9XP1W5sKa5/TCdGWaR/zBM2PeF/HbsQv/j2URNOiVuxWg=="],
"lightningcss-win32-arm64-msvc": ["lightningcss-win32-arm64-msvc@1.32.0", "", { "os": "win32", "cpu": "arm64" }, "sha512-8SbC8BR40pS6baCM8sbtYDSwEVQd4JlFTOlaD3gWGHfThTcABnNDBda6eTZeqbofalIJhFx0qKzgHJmcPTnGdw=="],
"lightningcss-win32-x64-msvc": ["lightningcss-win32-x64-msvc@1.32.0", "", { "os": "win32", "cpu": "x64" }, "sha512-Amq9B/SoZYdDi1kFrojnoqPLxYhQ4Wo5XiL8EVJrVsB8ARoC1PWW6VGtT0WKCemjy8aC+louJnjS7U18x3b06Q=="],
"locate-character": ["locate-character@3.0.0", "", {}, "sha512-SW13ws7BjaeJ6p7Q6CO2nchbYEc3X3J6WrmTTDto7yMPqVSZTUyY5Tjbid+Ab8gLnATtygYtiDIJGQRRn2ZOiA=="],
"lz-string": ["lz-string@1.5.0", "", { "bin": { "lz-string": "bin/bin.js" } }, "sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ=="],
"magic-string": ["magic-string@0.30.21", "", { "dependencies": { "@jridgewell/sourcemap-codec": "1.5.5" } }, "sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ=="],
"mdast-util-to-hast": ["mdast-util-to-hast@13.2.1", "", { "dependencies": { "@types/hast": "3.0.4", "@types/mdast": "4.0.4", "@ungap/structured-clone": "1.3.0", "devlop": "1.1.0", "micromark-util-sanitize-uri": "2.0.1", "trim-lines": "3.0.1", "unist-util-position": "5.0.0", "unist-util-visit": "5.1.0", "vfile": "6.0.3" } }, "sha512-cctsq2wp5vTsLIcaymblUriiTcZd0CwWtCbLvrOzYCDZoWyMNV8sZ7krj09FSnsiJi3WVsHLM4k6Dq/yaPyCXA=="],
"micromark-util-character": ["micromark-util-character@2.1.1", "", { "dependencies": { "micromark-util-symbol": "2.0.1", "micromark-util-types": "2.0.2" } }, "sha512-wv8tdUTJ3thSFFFJKtpYKOYiGP2+v96Hvk4Tu8KpCAsTMs6yi+nVmGh1syvSCsaxz45J6Jbw+9DD6g97+NV67Q=="],
"micromark-util-encode": ["micromark-util-encode@2.0.1", "", {}, "sha512-c3cVx2y4KqUnwopcO9b/SCdo2O67LwJJ/UyqGfbigahfegL9myoEFoDYZgkT7f36T0bLrM9hZTAaAyH+PCAXjw=="],
"micromark-util-sanitize-uri": ["micromark-util-sanitize-uri@2.0.1", "", { "dependencies": { "micromark-util-character": "2.1.1", "micromark-util-encode": "2.0.1", "micromark-util-symbol": "2.0.1" } }, "sha512-9N9IomZ/YuGGZZmQec1MbgxtlgougxTodVwDzzEouPKo3qFWvymFHWcnDi2vzV1ff6kas9ucW+o3yzJK9YB1AQ=="],
"micromark-util-symbol": ["micromark-util-symbol@2.0.1", "", {}, "sha512-vs5t8Apaud9N28kgCrRUdEed4UJ+wWNvicHLPxCa9ENlYuAY31M0ETy5y1vA33YoNPDFTghEbnh6efaE8h4x0Q=="],
"micromark-util-types": ["micromark-util-types@2.0.2", "", {}, "sha512-Yw0ECSpJoViF1qTU4DC6NwtC4aWGt1EkzaQB8KPPyCRR8z9TWeV0HbEFGTO+ZY1wB22zmxnJqhPyTpOVCpeHTA=="],
"mri": ["mri@1.2.0", "", {}, "sha512-tzzskb3bG8LvYGFF/mDTpq3jpI6Q9wc3LEmBaghu+DdCssd1FakN7Bc0hVNmEyGq1bq3RgfkCb3cmQLpNPOroA=="],
"mrmime": ["mrmime@2.0.1", "", {}, "sha512-Y3wQdFg2Va6etvQ5I82yUhGdsKrcYox6p7FfL1LbK2J4V01F9TGlepTIhnK24t7koZibmg82KGglhA1XK5IsLQ=="],
"nanoid": ["nanoid@3.3.11", "", { "bin": { "nanoid": "bin/nanoid.cjs" } }, "sha512-N8SpfPUnUp1bK+PMYW8qSWdl9U+wwNWI4QKxOYDy9JAro3WMX7p2OeVRF9v+347pnakNevPmiHhNmZ2HbFA76w=="],
"obug": ["obug@2.1.1", "", {}, "sha512-uTqF9MuPraAQ+IsnPf366RG4cP9RtUi7MLO1N3KEc+wb0a6yKpeL0lmk2IB1jY5KHPAlTc6T/JRdC/YqxHNwkQ=="],
"oniguruma-parser": ["oniguruma-parser@0.12.1", "", {}, "sha512-8Unqkvk1RYc6yq2WBYRj4hdnsAxVze8i7iPfQr8e4uSP3tRv0rpZcbGUDvxfQQcdwHt/e9PrMvGCsa8OqG9X3w=="],
"oniguruma-to-es": ["oniguruma-to-es@4.3.5", "", { "dependencies": { "oniguruma-parser": "0.12.1", "regex": "6.1.0", "regex-recursion": "6.0.2" } }, "sha512-Zjygswjpsewa0NLTsiizVuMQZbp0MDyM6lIt66OxsF21npUDlzpHi1Mgb/qhQdkb+dWFTzJmFbEWdvZgRho8eQ=="],
"picocolors": ["picocolors@1.1.1", "", {}, "sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA=="],
"picomatch": ["picomatch@4.0.4", "", {}, "sha512-QP88BAKvMam/3NxH6vj2o21R6MjxZUAd6nlwAS/pnGvN9IVLocLHxGYIzFhg6fUQ+5th6P4dv4eW9jX3DSIj7A=="],
"postcss": ["postcss@8.5.9", "", { "dependencies": { "nanoid": "3.3.11", "picocolors": "1.1.1", "source-map-js": "1.2.1" } }, "sha512-7a70Nsot+EMX9fFU3064K/kdHWZqGVY+BADLyXc8Dfv+mTLLVl6JzJpPaCZ2kQL9gIJvKXSLMHhqdRRjwQeFtw=="],
"property-information": ["property-information@7.1.0", "", {}, "sha512-TwEZ+X+yCJmYfL7TPUOcvBZ4QfoT5YenQiJuX//0th53DE6w0xxLEtfK3iyryQFddXuvkIk51EEgrJQ0WJkOmQ=="],
"readdirp": ["readdirp@4.1.2", "", {}, "sha512-GDhwkLfywWL2s6vEjyhri+eXmfH6j1L7JE27WhqLeYzoh/A3DBaYGEj2H/HFZCn/kMfim73FXxEJTw06WtxQwg=="],
"regex": ["regex@6.1.0", "", { "dependencies": { "regex-utilities": "2.3.0" } }, "sha512-6VwtthbV4o/7+OaAF9I5L5V3llLEsoPyq9P1JVXkedTP33c7MfCG0/5NOPcSJn0TzXcG9YUrR0gQSWioew3LDg=="],
"regex-recursion": ["regex-recursion@6.0.2", "", { "dependencies": { "regex-utilities": "2.3.0" } }, "sha512-0YCaSCq2VRIebiaUviZNs0cBz1kg5kVS2UKUfNIx8YVs1cN3AV7NTctO5FOKBA+UT2BPJIWZauYHPqJODG50cg=="],
"regex-utilities": ["regex-utilities@2.3.0", "", {}, "sha512-8VhliFJAWRaUiVvREIiW2NXXTmHs4vMNnSzuJVhscgmGav3g9VDxLrQndI3dZZVVdp0ZO/5v0xmX516/7M9cng=="],
"rolldown": ["rolldown@1.0.0-rc.15", "", { "dependencies": { "@oxc-project/types": "0.124.0", "@rolldown/pluginutils": "1.0.0-rc.15" }, "optionalDependencies": { "@rolldown/binding-android-arm64": "1.0.0-rc.15", "@rolldown/binding-darwin-arm64": "1.0.0-rc.15", "@rolldown/binding-darwin-x64": "1.0.0-rc.15", "@rolldown/binding-freebsd-x64": "1.0.0-rc.15", "@rolldown/binding-linux-arm-gnueabihf": "1.0.0-rc.15", "@rolldown/binding-linux-arm64-gnu": "1.0.0-rc.15", "@rolldown/binding-linux-arm64-musl": "1.0.0-rc.15", "@rolldown/binding-linux-ppc64-gnu": "1.0.0-rc.15", "@rolldown/binding-linux-s390x-gnu": "1.0.0-rc.15", "@rolldown/binding-linux-x64-gnu": "1.0.0-rc.15", "@rolldown/binding-linux-x64-musl": "1.0.0-rc.15", "@rolldown/binding-openharmony-arm64": "1.0.0-rc.15", "@rolldown/binding-wasm32-wasi": "1.0.0-rc.15", "@rolldown/binding-win32-arm64-msvc": "1.0.0-rc.15", "@rolldown/binding-win32-x64-msvc": "1.0.0-rc.15" }, "bin": { "rolldown": "bin/cli.mjs" } }, "sha512-Ff31guA5zT6WjnGp0SXw76X6hzGRk/OQq2hE+1lcDe+lJdHSgnSX6nK3erbONHyCbpSj9a9E+uX/OvytZoWp2g=="],
"runed": ["runed@0.35.1", "", { "dependencies": { "dequal": "2.0.3", "esm-env": "1.2.2", "lz-string": "1.5.0" }, "optionalDependencies": { "@sveltejs/kit": "2.57.1" }, "peerDependencies": { "svelte": "5.55.3" } }, "sha512-2F4Q/FZzbeJTFdIS/PuOoPRSm92sA2LhzTnv6FXhCoENb3huf5+fDuNOg1LNvGOouy3u/225qxmuJvcV3IZK5Q=="],
"sade": ["sade@1.8.1", "", { "dependencies": { "mri": "1.2.0" } }, "sha512-xal3CZX1Xlo/k4ApwCFrHVACi9fBqJ7V+mwhBsuf/1IOKbBy098Fex+Wa/5QMubw09pSZ/u8EY8PWgevJsXp1A=="],
"set-cookie-parser": ["set-cookie-parser@3.1.0", "", {}, "sha512-kjnC1DXBHcxaOaOXBHBeRtltsDG2nUiUni+jP92M9gYdW12rsmx92UsfpH7o5tDRs7I1ZZPSQJQGv3UaRfCiuw=="],
"shiki": ["shiki@4.0.2", "", { "dependencies": { "@shikijs/core": "4.0.2", "@shikijs/engine-javascript": "4.0.2", "@shikijs/engine-oniguruma": "4.0.2", "@shikijs/langs": "4.0.2", "@shikijs/themes": "4.0.2", "@shikijs/types": "4.0.2", "@shikijs/vscode-textmate": "10.0.2", "@types/hast": "3.0.4" } }, "sha512-eAVKTMedR5ckPo4xne/PjYQYrU3qx78gtJZ+sHlXEg5IHhhoQhMfZVzetTYuaJS0L2Ef3AcCRzCHV8T0WI6nIQ=="],
"sirv": ["sirv@3.0.2", "", { "dependencies": { "@polka/url": "1.0.0-next.29", "mrmime": "2.0.1", "totalist": "3.0.1" } }, "sha512-2wcC/oGxHis/BoHkkPwldgiPSYcpZK3JU28WoMVv55yHJgcZ8rlXvuG9iZggz+sU1d4bRgIGASwyWqjxu3FM0g=="],
"source-map-js": ["source-map-js@1.2.1", "", {}, "sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA=="],
"space-separated-tokens": ["space-separated-tokens@2.0.2", "", {}, "sha512-PEGlAwrG8yXGXRjW32fGbg66JAlOAwbObuqVoJpv/mRgoWDQfgH1wDPvtzWyUSNAXBGSk8h755YDbbcEy3SH2Q=="],
"stringify-entities": ["stringify-entities@4.0.4", "", { "dependencies": { "character-entities-html4": "2.1.0", "character-entities-legacy": "3.0.0" } }, "sha512-IwfBptatlO+QCJUo19AqvrPNqlVMpW9YEL2LIVY+Rpv2qsjCGxaDLNRgeGsQWJhfItebuJhsGSLjaBbNSQ+ieg=="],
"style-to-object": ["style-to-object@1.0.14", "", { "dependencies": { "inline-style-parser": "0.2.7" } }, "sha512-LIN7rULI0jBscWQYaSswptyderlarFkjQ+t79nzty8tcIAceVomEVlLzH5VP4Cmsv6MtKhs7qaAiwlcp+Mgaxw=="],
"svelte": ["svelte@5.55.3", "", { "dependencies": { "@jridgewell/remapping": "2.3.5", "@jridgewell/sourcemap-codec": "1.5.5", "@sveltejs/acorn-typescript": "1.0.9", "@types/estree": "1.0.8", "@types/trusted-types": "2.0.7", "acorn": "8.16.0", "aria-query": "5.3.1", "axobject-query": "4.1.0", "clsx": "2.1.1", "devalue": "5.7.1", "esm-env": "1.2.2", "esrap": "2.2.5", "is-reference": "3.0.3", "locate-character": "3.0.0", "magic-string": "0.30.21", "zimmerframe": "1.1.4" } }, "sha512-dS1N+i3bA1v+c4UDb750MlN5vCO82G6vxh8HeTsPsTdJ1BLsN1zxSyDlIdBBqUjqZ/BxEwM8UrFf98aaoVnZFQ=="],
"svelte-check": ["svelte-check@4.4.6", "", { "dependencies": { "@jridgewell/trace-mapping": "0.3.31", "chokidar": "4.0.3", "fdir": "6.5.0", "picocolors": "1.1.1", "sade": "1.8.1" }, "peerDependencies": { "svelte": "5.55.3", "typescript": "6.0.2" }, "bin": { "svelte-check": "bin/svelte-check" } }, "sha512-kP1zG81EWaFe9ZyTv4ZXv44Csi6Pkdpb7S3oj6m+K2ec/IcDg/a8LsFsnVLqm2nxtkSwsd5xPj/qFkTBgXHXjg=="],
"svelte-toolbelt": ["svelte-toolbelt@0.10.6", "", { "dependencies": { "clsx": "2.1.1", "runed": "0.35.1", "style-to-object": "1.0.14" }, "peerDependencies": { "svelte": "5.55.3" } }, "sha512-YWuX+RE+CnWYx09yseAe4ZVMM7e7GRFZM6OYWpBKOb++s+SQ8RBIMMe+Bs/CznBMc0QPLjr+vDBxTAkozXsFXQ=="],
"tabbable": ["tabbable@6.4.0", "", {}, "sha512-05PUHKSNE8ou2dwIxTngl4EzcnsCDZGJ/iCLtDflR/SHB/ny14rXc+qU5P4mG9JkusiV7EivzY9Mhm55AzAvCg=="],
"tailwindcss": ["tailwindcss@4.2.2", "", {}, "sha512-KWBIxs1Xb6NoLdMVqhbhgwZf2PGBpPEiwOqgI4pFIYbNTfBXiKYyWoTsXgBQ9WFg/OlhnvHaY+AEpW7wSmFo2Q=="],
"tapable": ["tapable@2.3.2", "", {}, "sha512-1MOpMXuhGzGL5TTCZFItxCc0AARf1EZFQkGqMm7ERKj8+Hgr5oLvJOVFcC+lRmR8hCe2S3jC4T5D7Vg/d7/fhA=="],
"tinyglobby": ["tinyglobby@0.2.16", "", { "dependencies": { "fdir": "6.5.0", "picomatch": "4.0.4" } }, "sha512-pn99VhoACYR8nFHhxqix+uvsbXineAasWm5ojXoN8xEwK5Kd3/TrhNn1wByuD52UxWRLy8pu+kRMniEi6Eq9Zg=="],
"totalist": ["totalist@3.0.1", "", {}, "sha512-sf4i37nQ2LBx4m3wB74y+ubopq6W/dIzXg0FDGjsYnZHVa1Da8FH853wlL2gtUhg+xJXjfk3kUZS3BRoQeoQBQ=="],
"trim-lines": ["trim-lines@3.0.1", "", {}, "sha512-kRj8B+YHZCc9kQYdWfJB2/oUl9rA99qbowYYBtr4ui4mZyAQ2JpvVBd/6U2YloATfqBhBTSMhTpgBHtU0Mf3Rg=="],
"tslib": ["tslib@2.8.1", "", {}, "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w=="],
"typescript": ["typescript@6.0.2", "", { "bin": { "tsc": "bin/tsc", "tsserver": "bin/tsserver" } }, "sha512-bGdAIrZ0wiGDo5l8c++HWtbaNCWTS4UTv7RaTH/ThVIgjkveJt83m74bBHMJkuCbslY8ixgLBVZJIOiQlQTjfQ=="],
"unist-util-is": ["unist-util-is@6.0.1", "", { "dependencies": { "@types/unist": "3.0.3" } }, "sha512-LsiILbtBETkDz8I9p1dQ0uyRUWuaQzd/cuEeS1hoRSyW5E5XGmTzlwY1OrNzzakGowI9Dr/I8HVaw4hTtnxy8g=="],
"unist-util-position": ["unist-util-position@5.0.0", "", { "dependencies": { "@types/unist": "3.0.3" } }, "sha512-fucsC7HjXvkB5R3kTCO7kUjRdrS0BJt3M/FPxmHMBOm8JQi2BsHAHFsy27E0EolP8rp0NzXsJ+jNPyDWvOJZPA=="],
"unist-util-stringify-position": ["unist-util-stringify-position@4.0.0", "", { "dependencies": { "@types/unist": "3.0.3" } }, "sha512-0ASV06AAoKCDkS2+xw5RXJywruurpbC4JZSm7nr7MOt1ojAzvyyaO+UxZf18j8FCF6kmzCZKcAgN/yu2gm2XgQ=="],
"unist-util-visit": ["unist-util-visit@5.1.0", "", { "dependencies": { "@types/unist": "3.0.3", "unist-util-is": "6.0.1", "unist-util-visit-parents": "6.0.2" } }, "sha512-m+vIdyeCOpdr/QeQCu2EzxX/ohgS8KbnPDgFni4dQsfSCtpz8UqDyY5GjRru8PDKuYn7Fq19j1CQ+nJSsGKOzg=="],
"unist-util-visit-parents": ["unist-util-visit-parents@6.0.2", "", { "dependencies": { "@types/unist": "3.0.3", "unist-util-is": "6.0.1" } }, "sha512-goh1s1TBrqSqukSc8wrjwWhL0hiJxgA8m4kFxGlQ+8FYQ3C/m11FcTs4YYem7V664AhHVvgoQLk890Ssdsr2IQ=="],
"vfile": ["vfile@6.0.3", "", { "dependencies": { "@types/unist": "3.0.3", "vfile-message": "4.0.3" } }, "sha512-KzIbH/9tXat2u30jf+smMwFCsno4wHVdNmzFyL+T/L3UGqqk6JKfVqOFOZEpZSHADH1k40ab6NUIXZq422ov3Q=="],
"vfile-message": ["vfile-message@4.0.3", "", { "dependencies": { "@types/unist": "3.0.3", "unist-util-stringify-position": "4.0.0" } }, "sha512-QTHzsGd1EhbZs4AsQ20JX1rC3cOlt/IWJruk893DfLRr57lcnOeMaWG4K0JrRta4mIJZKth2Au3mM3u03/JWKw=="],
"vite": ["vite@8.0.8", "", { "dependencies": { "lightningcss": "1.32.0", "picomatch": "4.0.4", "postcss": "8.5.9", "rolldown": "1.0.0-rc.15", "tinyglobby": "0.2.16" }, "optionalDependencies": { "fsevents": "2.3.3", "jiti": "2.6.1" }, "bin": { "vite": "bin/vite.js" } }, "sha512-dbU7/iLVa8KZALJyLOBOQ88nOXtNG8vxKuOT4I2mD+Ya70KPceF4IAmDsmU0h1Qsn5bPrvsY9HJstCRh3hG6Uw=="],
"vitefu": ["vitefu@1.1.3", "", { "optionalDependencies": { "vite": "8.0.8" } }, "sha512-ub4okH7Z5KLjb6hDyjqrGXqWtWvoYdU3IGm/NorpgHncKoLTCfRIbvlhBm7r0YstIaQRYlp4yEbFqDcKSzXSSg=="],
"zimmerframe": ["zimmerframe@1.1.4", "", {}, "sha512-B58NGBEoc8Y9MWWCQGl/gq9xBCe4IiKM0a2x7GZdQKOW5Exr8S1W24J6OgM1njK8xCRGvAJIL/MxXHf6SkmQKQ=="],
"zwitch": ["zwitch@2.0.4", "", {}, "sha512-bXE4cR/kVZhKZX/RjPEflHaKVhUVl85noU3v6b8apfQEc1x4A+zBxjZ4lN8LqGd6WZ3dl98pY4o717VFmoPp+A=="],
}
}

1564
frontend/pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@ -2,6 +2,19 @@ import { auth } from '$lib/auth.svelte';
export type ApiResult<T> = { ok: true; data: T } | { ok: false; error: string }; export type ApiResult<T> = { ok: true; data: T } | { ok: false; error: string };
async function parseResponse<T>(res: Response): Promise<ApiResult<T>> {
if (res.status === 204 || res.status === 202) {
const text = await res.text();
if (!text) return { ok: true, data: undefined as T };
const data = JSON.parse(text);
return { ok: true, data: data as T };
}
const data = await res.json();
if (!res.ok) return { ok: false, error: data?.error?.message ?? 'Something went wrong' };
return { ok: true, data: data as T };
}
export async function apiFetch<T>(method: string, path: string, body?: unknown): Promise<ApiResult<T>> { export async function apiFetch<T>(method: string, path: string, body?: unknown): Promise<ApiResult<T>> {
try { try {
const headers: Record<string, string> = { 'Content-Type': 'application/json' }; const headers: Record<string, string> = { 'Content-Type': 'application/json' };
@ -13,11 +26,7 @@ export async function apiFetch<T>(method: string, path: string, body?: unknown):
body: body ? JSON.stringify(body) : undefined body: body ? JSON.stringify(body) : undefined
}); });
if (res.status === 204) return { ok: true, data: undefined as T }; return await parseResponse<T>(res);
const data = await res.json();
if (!res.ok) return { ok: false, error: data?.error?.message ?? 'Something went wrong' };
return { ok: true, data: data as T };
} catch { } catch {
return { ok: false, error: 'Unable to connect to the server' }; return { ok: false, error: 'Unable to connect to the server' };
} }
@ -34,11 +43,7 @@ export async function apiFetchMultipart<T>(method: string, path: string, formDat
body: formData body: formData
}); });
if (res.status === 204) return { ok: true, data: undefined as T }; return await parseResponse<T>(res);
const data = await res.json();
if (!res.ok) return { ok: false, error: data?.error?.message ?? 'Something went wrong' };
return { ok: true, data: data as T };
} catch { } catch {
return { ok: false, error: 'Unable to connect to the server' }; return { ok: false, error: 'Unable to connect to the server' };
} }

View File

@ -149,6 +149,8 @@
case 'running': return 'var(--color-accent)'; case 'running': return 'var(--color-accent)';
case 'paused': return 'var(--color-amber)'; case 'paused': return 'var(--color-amber)';
case 'error': return 'var(--color-red)'; case 'error': return 'var(--color-red)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
return 'var(--color-blue)';
default: return 'var(--color-text-muted)'; default: return 'var(--color-text-muted)';
} }
} }
@ -158,6 +160,8 @@
case 'running': return 'rgba(94,140,88,0.12)'; case 'running': return 'rgba(94,140,88,0.12)';
case 'paused': return 'rgba(212,167,60,0.12)'; case 'paused': return 'rgba(212,167,60,0.12)';
case 'error': return 'rgba(207,129,114,0.12)'; case 'error': return 'rgba(207,129,114,0.12)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
return 'rgba(90,159,212,0.12)';
default: return 'rgba(255,255,255,0.05)'; default: return 'rgba(255,255,255,0.05)';
} }
} }
@ -167,6 +171,8 @@
case 'running': return 'rgba(94,140,88,0.3)'; case 'running': return 'rgba(94,140,88,0.3)';
case 'paused': return 'rgba(212,167,60,0.3)'; case 'paused': return 'rgba(212,167,60,0.3)';
case 'error': return 'rgba(207,129,114,0.3)'; case 'error': return 'rgba(207,129,114,0.3)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
return 'rgba(90,159,212,0.3)';
default: return 'rgba(255,255,255,0.08)'; default: return 'rgba(255,255,255,0.08)';
} }
} }
@ -418,7 +424,8 @@
</div> </div>
{:else} {:else}
{#each filteredCapsules as capsule, i (capsule.id)} {#each filteredCapsules as capsule, i (capsule.id)}
{@const stripeColor = capsule.status === 'running' ? 'bg-[var(--color-accent)]' : capsule.status === 'paused' ? 'bg-[var(--color-amber)]' : capsule.status === 'error' ? 'bg-[var(--color-red)]' : 'bg-[var(--color-text-muted)]'} {@const isTransient = ['starting', 'resuming', 'pausing', 'stopping'].includes(capsule.status)}
{@const stripeColor = capsule.status === 'running' ? 'bg-[var(--color-accent)]' : capsule.status === 'paused' ? 'bg-[var(--color-amber)]' : capsule.status === 'error' ? 'bg-[var(--color-red)]' : isTransient ? 'bg-[var(--color-blue)]' : 'bg-[var(--color-text-muted)]'}
<div <div
class="capsule-row relative grid grid-cols-[1.6fr_0.9fr_0.5fr_0.5fr_1fr_0.7fr_0.8fr] items-center overflow-hidden border-b border-[var(--color-border)] transition-colors duration-150 hover:bg-[var(--color-bg-3)] last:border-b-0 {newCapsuleId === capsule.id ? 'capsule-born' : ''}" class="capsule-row relative grid grid-cols-[1.6fr_0.9fr_0.5fr_0.5fr_1fr_0.7fr_0.8fr] items-center overflow-hidden border-b border-[var(--color-border)] transition-colors duration-150 hover:bg-[var(--color-bg-3)] last:border-b-0 {newCapsuleId === capsule.id ? 'capsule-born' : ''}"
style={initialAnimationDone ? '' : `animation: fadeUp 0.35s ease both; animation-delay: ${i * 40}ms`} style={initialAnimationDone ? '' : `animation: fadeUp 0.35s ease both; animation-delay: ${i * 40}ms`}
@ -437,6 +444,11 @@
<span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-amber)]"></span> <span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-amber)]"></span>
{:else if capsule.status === 'error'} {:else if capsule.status === 'error'}
<span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-red)]"></span> <span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-red)]"></span>
{:else if isTransient}
<span class="relative flex h-[6px] w-[6px] shrink-0">
<span class="animate-status-ping absolute inline-flex h-full w-full rounded-full bg-[var(--color-blue)]"></span>
<span class="relative inline-flex h-[6px] w-[6px] rounded-full bg-[var(--color-blue)]"></span>
</span>
{:else} {:else}
<span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-text-muted)]"></span> <span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-text-muted)]"></span>
{/if} {/if}

View File

@ -470,7 +470,8 @@
</div> </div>
{:else} {:else}
{#each filteredCapsules as capsule, i (capsule.id)} {#each filteredCapsules as capsule, i (capsule.id)}
{@const stripeColor = capsule.status === 'running' ? 'bg-[var(--color-accent)]' : capsule.status === 'paused' ? 'bg-[var(--color-amber)]' : 'bg-[var(--color-text-muted)]'} {@const isTransient = ['starting', 'resuming', 'pausing', 'stopping'].includes(capsule.status)}
{@const stripeColor = capsule.status === 'running' ? 'bg-[var(--color-accent)]' : capsule.status === 'paused' ? 'bg-[var(--color-amber)]' : isTransient ? 'bg-[var(--color-blue)]' : 'bg-[var(--color-text-muted)]'}
<div <div
class="capsule-row relative grid grid-cols-[1.6fr_0.8fr_0.5fr_0.5fr_0.6fr_1fr_0.9fr] items-center overflow-hidden border-b border-[var(--color-border)] transition-colors duration-150 hover:bg-[var(--color-bg-3)] last:border-b-0 {newCapsuleId === capsule.id ? 'capsule-born' : ''}" class="capsule-row relative grid grid-cols-[1.6fr_0.8fr_0.5fr_0.5fr_0.6fr_1fr_0.9fr] items-center overflow-hidden border-b border-[var(--color-border)] transition-colors duration-150 hover:bg-[var(--color-bg-3)] last:border-b-0 {newCapsuleId === capsule.id ? 'capsule-born' : ''}"
style={initialAnimationDone ? '' : `animation: fadeUp 0.35s ease both; animation-delay: ${i * 40}ms`} style={initialAnimationDone ? '' : `animation: fadeUp 0.35s ease both; animation-delay: ${i * 40}ms`}
@ -487,6 +488,11 @@
</span> </span>
{:else if capsule.status === 'paused'} {:else if capsule.status === 'paused'}
<span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-amber)]"></span> <span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-amber)]"></span>
{:else if isTransient}
<span class="relative flex h-[6px] w-[6px] shrink-0">
<span class="animate-status-ping absolute inline-flex h-full w-full rounded-full bg-[var(--color-blue)]"></span>
<span class="relative inline-flex h-[6px] w-[6px] rounded-full bg-[var(--color-blue)]"></span>
</span>
{:else} {:else}
<span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-text-muted)]"></span> <span class="inline-flex h-[6px] w-[6px] shrink-0 rounded-full bg-[var(--color-text-muted)]"></span>
{/if} {/if}
@ -556,7 +562,7 @@
openMenuId = capsule.id; openMenuId = capsule.id;
} }
}} }}
class="inline-flex items-center gap-1.5 rounded-[var(--radius-button)] border px-2.5 py-1 text-label font-semibold uppercase tracking-[0.04em] transition-colors duration-150 {capsule.status === 'running' ? 'border-[var(--color-accent)]/40 bg-[var(--color-accent-glow)] text-[var(--color-accent-mid)] hover:border-[var(--color-accent)]/70 hover:text-[var(--color-accent-bright)]' : capsule.status === 'paused' ? 'border-[var(--color-amber)]/30 bg-[var(--color-amber)]/5 text-[var(--color-amber)] hover:border-[var(--color-amber)]/60' : 'border-[var(--color-border)] bg-[var(--color-bg-2)] text-[var(--color-text-secondary)] hover:border-[var(--color-border-mid)] hover:text-[var(--color-text-primary)]'}" class="inline-flex items-center gap-1.5 rounded-[var(--radius-button)] border px-2.5 py-1 text-label font-semibold uppercase tracking-[0.04em] transition-colors duration-150 {capsule.status === 'running' ? 'border-[var(--color-accent)]/40 bg-[var(--color-accent-glow)] text-[var(--color-accent-mid)] hover:border-[var(--color-accent)]/70 hover:text-[var(--color-accent-bright)]' : capsule.status === 'paused' ? 'border-[var(--color-amber)]/30 bg-[var(--color-amber)]/5 text-[var(--color-amber)] hover:border-[var(--color-amber)]/60' : isTransient ? 'border-[var(--color-blue)]/30 bg-[var(--color-blue)]/5 text-[var(--color-blue)]' : 'border-[var(--color-border)] bg-[var(--color-bg-2)] text-[var(--color-text-secondary)] hover:border-[var(--color-border-mid)] hover:text-[var(--color-text-primary)]'}"
> >
{capsule.status} {capsule.status}
<svg <svg

View File

@ -404,6 +404,8 @@
case 'running': return 'var(--color-accent)'; case 'running': return 'var(--color-accent)';
case 'paused': return 'var(--color-amber)'; case 'paused': return 'var(--color-amber)';
case 'error': return 'var(--color-red)'; case 'error': return 'var(--color-red)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
return 'var(--color-blue)';
default: return 'var(--color-text-muted)'; default: return 'var(--color-text-muted)';
} }
} }
@ -413,6 +415,8 @@
case 'running': return 'rgba(94,140,88,0.12)'; case 'running': return 'rgba(94,140,88,0.12)';
case 'paused': return 'rgba(212,167,60,0.12)'; case 'paused': return 'rgba(212,167,60,0.12)';
case 'error': return 'rgba(207,129,114,0.12)'; case 'error': return 'rgba(207,129,114,0.12)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
return 'rgba(90,159,212,0.12)';
default: return 'rgba(255,255,255,0.05)'; default: return 'rgba(255,255,255,0.05)';
} }
} }
@ -422,6 +426,8 @@
case 'running': return 'rgba(94,140,88,0.3)'; case 'running': return 'rgba(94,140,88,0.3)';
case 'paused': return 'rgba(212,167,60,0.3)'; case 'paused': return 'rgba(212,167,60,0.3)';
case 'error': return 'rgba(207,129,114,0.3)'; case 'error': return 'rgba(207,129,114,0.3)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
return 'rgba(90,159,212,0.3)';
default: return 'rgba(255,255,255,0.08)'; default: return 'rgba(255,255,255,0.08)';
} }
} }

View File

@ -1,5 +1,5 @@
#!/bin/sh #!/bin/sh
# wrenn-init: minimal PID 1 init for Firecracker microVMs. # wrenn-init: minimal PID 1 init for Cloud Hypervisor microVMs.
# Mounts virtual filesystems, starts chronyd for time sync, then execs tini + envd. # Mounts virtual filesystems, starts chronyd for time sync, then execs tini + envd.
set -e set -e
@ -17,6 +17,11 @@ mkdir -p /sys/fs/cgroup
mount -t cgroup2 cgroup2 /sys/fs/cgroup 2>/dev/null || true mount -t cgroup2 cgroup2 /sys/fs/cgroup 2>/dev/null || true
echo "+cpu +memory +io" > /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || true echo "+cpu +memory +io" > /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || true
# Disable write_zeroes on rootfs — dm-snapshot doesn't support BLKZEROOUT,
# and CH advertises the feature anyway. Without this, every zeroing IO
# hits EOPNOTSUPP and CH spams warnings. Only writable on kernel 6.6+.
echo 0 > /sys/block/vda/queue/write_zeroes_max_bytes 2>/dev/null || true
# Set hostname and make it resolvable (sudo requires this). # Set hostname and make it resolvable (sudo requires this).
hostname capsule hostname capsule
echo "127.0.0.1 capsule" >> /etc/hosts echo "127.0.0.1 capsule" >> /etc/hosts

View File

@ -3,10 +3,17 @@ package api
import ( import (
"context" "context"
"fmt" "fmt"
"log/slog"
"net/http"
"time"
"github.com/go-chi/chi/v5"
"github.com/gorilla/websocket"
"github.com/jackc/pgx/v5/pgtype" "github.com/jackc/pgx/v5/pgtype"
"git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/db" "git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/id"
"git.omukk.dev/wrenn/wrenn/pkg/lifecycle" "git.omukk.dev/wrenn/wrenn/pkg/lifecycle"
"git.omukk.dev/wrenn/wrenn/proto/hostagent/gen/hostagentv1connect" "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen/hostagentv1connect"
) )
@ -20,3 +27,82 @@ func agentForHost(ctx context.Context, queries *db.Queries, pool *lifecycle.Host
} }
return pool.GetForHost(host) return pool.GetForHost(host)
} }
// requireRunningSandbox parses the sandbox ID from the URL, looks it up by team,
// and verifies it is running. On failure it writes the appropriate HTTP error and
// returns false.
func requireRunningSandbox(w http.ResponseWriter, r *http.Request, queries *db.Queries, teamID pgtype.UUID) (db.Sandbox, pgtype.UUID, string, bool) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context()
sandboxID, err := id.ParseSandboxID(sandboxIDStr)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return db.Sandbox{}, pgtype.UUID{}, "", false
}
sb, err := queries.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return db.Sandbox{}, pgtype.UUID{}, "", false
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running (status: "+sb.Status+")")
return db.Sandbox{}, pgtype.UUID{}, "", false
}
return sb, sandboxID, sandboxIDStr, true
}
// upgradeAndAuthenticate upgrades the HTTP connection to WebSocket and resolves
// the auth context — either from middleware (API key) or from the first WS message (JWT).
// Returns the connection and auth context, or an error if authentication fails.
// The caller is responsible for closing the returned connection.
func upgradeAndAuthenticate(w http.ResponseWriter, r *http.Request, jwtSecret []byte, queries *db.Queries) (*websocket.Conn, auth.AuthContext, error) {
ctx := r.Context()
ac, hasAuth := auth.FromContext(ctx)
if hasAuth {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
return nil, auth.AuthContext{}, fmt.Errorf("websocket upgrade: %w", err)
}
return conn, ac, nil
}
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
return nil, auth.AuthContext{}, fmt.Errorf("websocket upgrade: %w", err)
}
var wsAC auth.AuthContext
var authErr error
if isAdminWSRoute(ctx) {
wsAC, authErr = wsAuthenticateAdmin(ctx, conn, jwtSecret, queries)
} else {
wsAC, authErr = wsAuthenticate(ctx, conn, jwtSecret, queries)
}
if authErr != nil {
conn.Close()
return nil, auth.AuthContext{}, fmt.Errorf("authentication failed")
}
return conn, wsAC, nil
}
// updateLastActive updates the sandbox last_active_at timestamp.
// Uses a background context with timeout for streaming handlers where
// the request context may already be cancelled.
func updateLastActive(queries *db.Queries, sandboxID pgtype.UUID, sandboxIDStr string) {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := queries.UpdateLastActive(ctx, db.UpdateLastActiveParams{
ID: sandboxID,
LastActiveAt: pgtype.Timestamptz{
Time: time.Now(),
Valid: true,
},
}); err != nil {
slog.Warn("failed to update last_active_at", "id", sandboxIDStr, "error", err)
}
}

View File

@ -57,7 +57,7 @@ func (h *adminCapsuleHandler) Create(w http.ResponseWriter, r *http.Request) {
ac.TeamID = id.PlatformTeamID ac.TeamID = id.PlatformTeamID
h.audit.LogSandboxCreate(r.Context(), ac, sb.ID, sb.Template) h.audit.LogSandboxCreate(r.Context(), ac, sb.ID, sb.Template)
writeJSON(w, http.StatusCreated, sandboxToResponse(sb)) writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
} }
// List handles GET /v1/admin/capsules. // List handles GET /v1/admin/capsules.
@ -113,7 +113,7 @@ func (h *adminCapsuleHandler) Destroy(w http.ResponseWriter, r *http.Request) {
} }
h.audit.LogSandboxDestroy(r.Context(), ac, sandboxID) h.audit.LogSandboxDestroy(r.Context(), ac, sandboxID)
w.WriteHeader(http.StatusNoContent) w.WriteHeader(http.StatusAccepted)
} }
type adminSnapshotRequest struct { type adminSnapshotRequest struct {

View File

@ -3,14 +3,11 @@ package api
import ( import (
"encoding/base64" "encoding/base64"
"encoding/json" "encoding/json"
"log/slog"
"net/http" "net/http"
"time" "time"
"unicode/utf8" "unicode/utf8"
"connectrpc.com/connect" "connectrpc.com/connect"
"github.com/go-chi/chi/v5"
"github.com/jackc/pgx/v5/pgtype"
"git.omukk.dev/wrenn/wrenn/pkg/auth" "git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/db" "git.omukk.dev/wrenn/wrenn/pkg/db"
@ -58,23 +55,11 @@ type backgroundExecResponse struct {
// Exec handles POST /v1/capsules/{id}/exec. // Exec handles POST /v1/capsules/{id}/exec.
func (h *execHandler) Exec(w http.ResponseWriter, r *http.Request) { func (h *execHandler) Exec(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, sandboxID, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running (status: "+sb.Status+")")
return return
} }
@ -116,15 +101,7 @@ func (h *execHandler) Exec(w http.ResponseWriter, r *http.Request) {
return return
} }
if err := h.db.UpdateLastActive(ctx, db.UpdateLastActiveParams{ updateLastActive(h.db, sandboxID, sandboxIDStr)
ID: sandboxID,
LastActiveAt: pgtype.Timestamptz{
Time: time.Now(),
Valid: true,
},
}); err != nil {
slog.Warn("failed to update last_active_at", "id", sandboxIDStr, "error", err)
}
writeJSON(w, http.StatusAccepted, backgroundExecResponse{ writeJSON(w, http.StatusAccepted, backgroundExecResponse{
SandboxID: sandboxIDStr, SandboxID: sandboxIDStr,
@ -142,6 +119,8 @@ func (h *execHandler) Exec(w http.ResponseWriter, r *http.Request) {
Cmd: req.Cmd, Cmd: req.Cmd,
Args: req.Args, Args: req.Args,
TimeoutSec: req.TimeoutSec, TimeoutSec: req.TimeoutSec,
Envs: req.Envs,
Cwd: req.Cwd,
})) }))
if err != nil { if err != nil {
status, code, msg := agentErrToHTTP(err) status, code, msg := agentErrToHTTP(err)
@ -151,41 +130,24 @@ func (h *execHandler) Exec(w http.ResponseWriter, r *http.Request) {
duration := time.Since(start) duration := time.Since(start)
// Update last active. updateLastActive(h.db, sandboxID, sandboxIDStr)
if err := h.db.UpdateLastActive(ctx, db.UpdateLastActiveParams{
ID: sandboxID,
LastActiveAt: pgtype.Timestamptz{
Time: time.Now(),
Valid: true,
},
}); err != nil {
slog.Warn("failed to update last_active_at", "id", sandboxIDStr, "error", err)
}
// Use base64 encoding if output contains non-UTF-8 bytes.
stdout := resp.Msg.Stdout stdout := resp.Msg.Stdout
stderr := resp.Msg.Stderr stderr := resp.Msg.Stderr
encoding := "utf-8"
encoding := "utf-8"
stdoutStr, stderrStr := string(stdout), string(stderr)
if !utf8.Valid(stdout) || !utf8.Valid(stderr) { if !utf8.Valid(stdout) || !utf8.Valid(stderr) {
encoding = "base64" encoding = "base64"
writeJSON(w, http.StatusOK, execResponse{ stdoutStr = base64.StdEncoding.EncodeToString(stdout)
SandboxID: sandboxIDStr, stderrStr = base64.StdEncoding.EncodeToString(stderr)
Cmd: req.Cmd,
Stdout: base64.StdEncoding.EncodeToString(stdout),
Stderr: base64.StdEncoding.EncodeToString(stderr),
ExitCode: resp.Msg.ExitCode,
DurationMs: duration.Milliseconds(),
Encoding: encoding,
})
return
} }
writeJSON(w, http.StatusOK, execResponse{ writeJSON(w, http.StatusOK, execResponse{
SandboxID: sandboxIDStr, SandboxID: sandboxIDStr,
Cmd: req.Cmd, Cmd: req.Cmd,
Stdout: string(stdout), Stdout: stdoutStr,
Stderr: string(stderr), Stderr: stderrStr,
ExitCode: resp.Msg.ExitCode, ExitCode: resp.Msg.ExitCode,
DurationMs: duration.Milliseconds(), DurationMs: duration.Milliseconds(),
Encoding: encoding, Encoding: encoding,

View File

@ -5,7 +5,6 @@ import (
"encoding/json" "encoding/json"
"log/slog" "log/slog"
"net/http" "net/http"
"time"
"connectrpc.com/connect" "connectrpc.com/connect"
"github.com/go-chi/chi/v5" "github.com/go-chi/chi/v5"
@ -59,37 +58,9 @@ func (h *execStreamHandler) ExecStream(w http.ResponseWriter, r *http.Request) {
return return
} }
// Authenticate: use context from middleware (API key) or WS first message (JWT). conn, ac, err := upgradeAndAuthenticate(w, r, h.jwtSecret, h.db)
ac, hasAuth := auth.FromContext(ctx)
if !hasAuth {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
slog.Error("websocket upgrade failed", "error", err)
return
}
defer conn.Close()
var wsAC auth.AuthContext
var authErr error
if isAdminWSRoute(ctx) {
wsAC, authErr = wsAuthenticateAdmin(ctx, conn, h.jwtSecret, h.db)
} else {
wsAC, authErr = wsAuthenticate(ctx, conn, h.jwtSecret, h.db)
}
if authErr != nil {
sendWSError(conn, "authentication failed")
return
}
ac = wsAC
h.runExecStream(ctx, conn, ac, sandboxID, sandboxIDStr)
return
}
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil { if err != nil {
slog.Error("websocket upgrade failed", "error", err) slog.Error("websocket upgrade/auth failed", "error", err)
return return
} }
defer conn.Close() defer conn.Close()
@ -186,18 +157,7 @@ func (h *execStreamHandler) runExecStream(ctx context.Context, conn *websocket.C
} }
} }
// Update last active using a fresh context (the request context may be cancelled). updateLastActive(h.db, sandboxID, sandboxIDStr)
updateCtx, updateCancel := context.WithTimeout(context.Background(), 5*time.Second)
defer updateCancel()
if err := h.db.UpdateLastActive(updateCtx, db.UpdateLastActiveParams{
ID: sandboxID,
LastActiveAt: pgtype.Timestamptz{
Time: time.Now(),
Valid: true,
},
}); err != nil {
slog.Warn("failed to update last active after stream exec", "sandbox_id", sandboxIDStr, "error", err)
}
} }
func sendWSError(conn *websocket.Conn, msg string) { func sendWSError(conn *websocket.Conn, msg string) {

View File

@ -7,11 +7,9 @@ import (
"net/http" "net/http"
"connectrpc.com/connect" "connectrpc.com/connect"
"github.com/go-chi/chi/v5"
"git.omukk.dev/wrenn/wrenn/pkg/auth" "git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/db" "git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/id"
"git.omukk.dev/wrenn/wrenn/pkg/lifecycle" "git.omukk.dev/wrenn/wrenn/pkg/lifecycle"
pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen" pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen"
) )
@ -30,23 +28,11 @@ func newFilesHandler(db *db.Queries, pool *lifecycle.HostClientPool) *filesHandl
// - "path" text field: absolute destination path inside the sandbox // - "path" text field: absolute destination path inside the sandbox
// - "file" file field: binary content to write // - "file" file field: binary content to write
func (h *filesHandler) Upload(w http.ResponseWriter, r *http.Request) { func (h *filesHandler) Upload(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }
@ -108,23 +94,11 @@ type readFileRequest struct {
// Download handles POST /v1/capsules/{id}/files/read. // Download handles POST /v1/capsules/{id}/files/read.
// Accepts JSON body with path, returns raw file content with Content-Disposition. // Accepts JSON body with path, returns raw file content with Content-Disposition.
func (h *filesHandler) Download(w http.ResponseWriter, r *http.Request) { func (h *filesHandler) Download(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }

View File

@ -8,11 +8,9 @@ import (
"net/http" "net/http"
"connectrpc.com/connect" "connectrpc.com/connect"
"github.com/go-chi/chi/v5"
"git.omukk.dev/wrenn/wrenn/pkg/auth" "git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/db" "git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/id"
"git.omukk.dev/wrenn/wrenn/pkg/lifecycle" "git.omukk.dev/wrenn/wrenn/pkg/lifecycle"
pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen" pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen"
) )
@ -30,23 +28,11 @@ func newFilesStreamHandler(db *db.Queries, pool *lifecycle.HostClientPool) *file
// Expects multipart/form-data with "path" text field and "file" file field. // Expects multipart/form-data with "path" text field and "file" file field.
// Streams file content directly from the request body to the host agent without buffering. // Streams file content directly from the request body to the host agent without buffering.
func (h *filesStreamHandler) StreamUpload(w http.ResponseWriter, r *http.Request) { func (h *filesStreamHandler) StreamUpload(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }
@ -103,6 +89,12 @@ func (h *filesStreamHandler) StreamUpload(w http.ResponseWriter, r *http.Request
// Open client-streaming RPC to host agent. // Open client-streaming RPC to host agent.
stream := agent.WriteFileStream(ctx) stream := agent.WriteFileStream(ctx)
var streamClosed bool
defer func() {
if !streamClosed {
stream.CloseAndReceive()
}
}()
// Send metadata first. // Send metadata first.
if err := stream.Send(&pb.WriteFileStreamRequest{ if err := stream.Send(&pb.WriteFileStreamRequest{
@ -141,6 +133,7 @@ func (h *filesStreamHandler) StreamUpload(w http.ResponseWriter, r *http.Request
} }
// Close and receive response. // Close and receive response.
streamClosed = true
if _, err := stream.CloseAndReceive(); err != nil { if _, err := stream.CloseAndReceive(); err != nil {
status, code, msg := agentErrToHTTP(err) status, code, msg := agentErrToHTTP(err)
writeError(w, status, code, msg) writeError(w, status, code, msg)
@ -153,23 +146,11 @@ func (h *filesStreamHandler) StreamUpload(w http.ResponseWriter, r *http.Request
// StreamDownload handles POST /v1/capsules/{id}/files/stream/read. // StreamDownload handles POST /v1/capsules/{id}/files/stream/read.
// Accepts JSON body with path, streams file content back without buffering. // Accepts JSON body with path, streams file content back without buffering.
func (h *filesStreamHandler) StreamDownload(w http.ResponseWriter, r *http.Request) { func (h *filesStreamHandler) StreamDownload(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }

View File

@ -4,11 +4,9 @@ import (
"net/http" "net/http"
"connectrpc.com/connect" "connectrpc.com/connect"
"github.com/go-chi/chi/v5"
"git.omukk.dev/wrenn/wrenn/pkg/auth" "git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/db" "git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/id"
"git.omukk.dev/wrenn/wrenn/pkg/lifecycle" "git.omukk.dev/wrenn/wrenn/pkg/lifecycle"
pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen" pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen"
) )
@ -58,23 +56,11 @@ type removeRequest struct {
// ListDir handles POST /v1/capsules/{id}/files/list. // ListDir handles POST /v1/capsules/{id}/files/list.
func (h *fsHandler) ListDir(w http.ResponseWriter, r *http.Request) { func (h *fsHandler) ListDir(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }
@ -115,23 +101,11 @@ func (h *fsHandler) ListDir(w http.ResponseWriter, r *http.Request) {
// MakeDir handles POST /v1/capsules/{id}/files/mkdir. // MakeDir handles POST /v1/capsules/{id}/files/mkdir.
func (h *fsHandler) MakeDir(w http.ResponseWriter, r *http.Request) { func (h *fsHandler) MakeDir(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }
@ -166,23 +140,11 @@ func (h *fsHandler) MakeDir(w http.ResponseWriter, r *http.Request) {
// Remove handles POST /v1/capsules/{id}/files/remove. // Remove handles POST /v1/capsules/{id}/files/remove.
func (h *fsHandler) Remove(w http.ResponseWriter, r *http.Request) { func (h *fsHandler) Remove(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running")
return return
} }

View File

@ -1,6 +1,7 @@
package api package api
import ( import (
"context"
"errors" "errors"
"log/slog" "log/slog"
"net/http" "net/http"
@ -21,10 +22,11 @@ type hostHandler struct {
svc *service.HostService svc *service.HostService
queries *db.Queries queries *db.Queries
audit *audit.AuditLogger audit *audit.AuditLogger
monitor *HostMonitor
} }
func newHostHandler(svc *service.HostService, queries *db.Queries, al *audit.AuditLogger) *hostHandler { func newHostHandler(svc *service.HostService, queries *db.Queries, al *audit.AuditLogger, monitor *HostMonitor) *hostHandler {
return &hostHandler{svc: svc, queries: queries, audit: al} return &hostHandler{svc: svc, queries: queries, audit: al, monitor: monitor}
} }
// Request/response types. // Request/response types.
@ -426,9 +428,12 @@ func (h *hostHandler) Heartbeat(w http.ResponseWriter, r *http.Request) {
return return
} }
// Log marked_up if the host just recovered from unreachable. // If the host just recovered from unreachable, log it and trigger immediate
// reconciliation so "missing" sandboxes are resolved without waiting for the
// next monitor tick.
if prevHost.Status == "unreachable" { if prevHost.Status == "unreachable" {
h.audit.LogHostMarkedUp(r.Context(), prevHost.TeamID, hc.HostID) h.audit.LogHostMarkedUp(r.Context(), prevHost.TeamID, hc.HostID)
go h.monitor.ReconcileHost(context.Background(), hc.HostID)
} }
w.WriteHeader(http.StatusNoContent) w.WriteHeader(http.StatusNoContent)

View File

@ -5,7 +5,6 @@ import (
"log/slog" "log/slog"
"net/http" "net/http"
"strconv" "strconv"
"time"
"connectrpc.com/connect" "connectrpc.com/connect"
"github.com/go-chi/chi/v5" "github.com/go-chi/chi/v5"
@ -44,23 +43,11 @@ type processListResponse struct {
// ListProcesses handles GET /v1/capsules/{id}/processes. // ListProcesses handles GET /v1/capsules/{id}/processes.
func (h *processHandler) ListProcesses(w http.ResponseWriter, r *http.Request) { func (h *processHandler) ListProcesses(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running (status: "+sb.Status+")")
return return
} }
@ -95,24 +82,12 @@ func (h *processHandler) ListProcesses(w http.ResponseWriter, r *http.Request) {
// KillProcess handles DELETE /v1/capsules/{id}/processes/{selector}. // KillProcess handles DELETE /v1/capsules/{id}/processes/{selector}.
// The selector can be a numeric PID or a string tag. // The selector can be a numeric PID or a string tag.
func (h *processHandler) KillProcess(w http.ResponseWriter, r *http.Request) { func (h *processHandler) KillProcess(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id")
selectorStr := chi.URLParam(r, "selector") selectorStr := chi.URLParam(r, "selector")
ctx := r.Context() ctx := r.Context()
ac := auth.MustFromContext(ctx) ac := auth.MustFromContext(ctx)
sandboxID, err := id.ParseSandboxID(sandboxIDStr) sb, _, sandboxIDStr, ok := requireRunningSandbox(w, r, h.db, ac.TeamID)
if err != nil { if !ok {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid sandbox ID")
return
}
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil {
writeError(w, http.StatusNotFound, "not_found", "sandbox not found")
return
}
if sb.Status != "running" {
writeError(w, http.StatusConflict, "invalid_state", "sandbox is not running (status: "+sb.Status+")")
return return
} }
@ -146,14 +121,6 @@ func (h *processHandler) KillProcess(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNoContent) w.WriteHeader(http.StatusNoContent)
} }
// wsProcessOut is the JSON message sent to the WebSocket client.
type wsProcessOut struct {
Type string `json:"type"` // "start", "stdout", "stderr", "exit", "error"
PID uint32 `json:"pid,omitempty"` // only for "start"
Data string `json:"data,omitempty"` // only for "stdout", "stderr", "error"
ExitCode *int32 `json:"exit_code,omitempty"` // only for "exit"
}
// ConnectProcess handles WS /v1/capsules/{id}/processes/{selector}/stream. // ConnectProcess handles WS /v1/capsules/{id}/processes/{selector}/stream.
func (h *processHandler) ConnectProcess(w http.ResponseWriter, r *http.Request) { func (h *processHandler) ConnectProcess(w http.ResponseWriter, r *http.Request) {
sandboxIDStr := chi.URLParam(r, "id") sandboxIDStr := chi.URLParam(r, "id")
@ -166,37 +133,9 @@ func (h *processHandler) ConnectProcess(w http.ResponseWriter, r *http.Request)
return return
} }
// Authenticate: use context from middleware (API key) or WS first message (JWT). conn, ac, err := upgradeAndAuthenticate(w, r, h.jwtSecret, h.db)
ac, hasAuth := auth.FromContext(ctx)
if !hasAuth {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
slog.Error("process stream websocket upgrade failed", "error", err)
return
}
defer conn.Close()
var wsAC auth.AuthContext
var authErr error
if isAdminWSRoute(ctx) {
wsAC, authErr = wsAuthenticateAdmin(ctx, conn, h.jwtSecret, h.db)
} else {
wsAC, authErr = wsAuthenticate(ctx, conn, h.jwtSecret, h.db)
}
if authErr != nil {
sendProcessWSError(conn, "authentication failed")
return
}
ac = wsAC
h.runConnectProcess(ctx, conn, ac, sandboxID, sandboxIDStr, selectorStr)
return
}
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil { if err != nil {
slog.Error("process stream websocket upgrade failed", "error", err) slog.Error("process stream websocket upgrade/auth failed", "error", err)
return return
} }
defer conn.Close() defer conn.Close()
@ -207,17 +146,17 @@ func (h *processHandler) ConnectProcess(w http.ResponseWriter, r *http.Request)
func (h *processHandler) runConnectProcess(ctx context.Context, conn *websocket.Conn, ac auth.AuthContext, sandboxID pgtype.UUID, sandboxIDStr, selectorStr string) { func (h *processHandler) runConnectProcess(ctx context.Context, conn *websocket.Conn, ac auth.AuthContext, sandboxID pgtype.UUID, sandboxIDStr, selectorStr string) {
sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID}) sb, err := h.db.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: ac.TeamID})
if err != nil { if err != nil {
sendProcessWSError(conn, "sandbox not found") sendWSError(conn, "sandbox not found")
return return
} }
if sb.Status != "running" { if sb.Status != "running" {
sendProcessWSError(conn, "sandbox is not running (status: "+sb.Status+")") sendWSError(conn, "sandbox is not running (status: "+sb.Status+")")
return return
} }
agent, err := agentForHost(ctx, h.db, h.pool, sb.HostID) agent, err := agentForHost(ctx, h.db, h.pool, sb.HostID)
if err != nil { if err != nil {
sendProcessWSError(conn, "sandbox host is not reachable") sendWSError(conn, "sandbox host is not reachable")
return return
} }
@ -236,7 +175,7 @@ func (h *processHandler) runConnectProcess(ctx context.Context, conn *websocket.
stream, err := agent.ConnectProcess(streamCtx, connect.NewRequest(connectReq)) stream, err := agent.ConnectProcess(streamCtx, connect.NewRequest(connectReq))
if err != nil { if err != nil {
sendProcessWSError(conn, "failed to connect to process: "+err.Error()) sendWSError(conn, "failed to connect to process: "+err.Error())
return return
} }
defer stream.Close() defer stream.Close()
@ -257,42 +196,27 @@ func (h *processHandler) runConnectProcess(ctx context.Context, conn *websocket.
resp := stream.Msg() resp := stream.Msg()
switch ev := resp.Event.(type) { switch ev := resp.Event.(type) {
case *pb.ConnectProcessResponse_Start: case *pb.ConnectProcessResponse_Start:
writeWSJSON(conn, wsProcessOut{Type: "start", PID: ev.Start.Pid}) writeWSJSON(conn, wsOutMsg{Type: "start", PID: ev.Start.Pid})
case *pb.ConnectProcessResponse_Data: case *pb.ConnectProcessResponse_Data:
switch o := ev.Data.Output.(type) { switch o := ev.Data.Output.(type) {
case *pb.ExecStreamData_Stdout: case *pb.ExecStreamData_Stdout:
writeWSJSON(conn, wsProcessOut{Type: "stdout", Data: string(o.Stdout)}) writeWSJSON(conn, wsOutMsg{Type: "stdout", Data: string(o.Stdout)})
case *pb.ExecStreamData_Stderr: case *pb.ExecStreamData_Stderr:
writeWSJSON(conn, wsProcessOut{Type: "stderr", Data: string(o.Stderr)}) writeWSJSON(conn, wsOutMsg{Type: "stderr", Data: string(o.Stderr)})
} }
case *pb.ConnectProcessResponse_End: case *pb.ConnectProcessResponse_End:
exitCode := ev.End.ExitCode exitCode := ev.End.ExitCode
writeWSJSON(conn, wsProcessOut{Type: "exit", ExitCode: &exitCode}) writeWSJSON(conn, wsOutMsg{Type: "exit", ExitCode: &exitCode})
} }
} }
if err := stream.Err(); err != nil { if err := stream.Err(); err != nil {
if streamCtx.Err() == nil { if streamCtx.Err() == nil {
sendProcessWSError(conn, err.Error()) sendWSError(conn, err.Error())
} }
} }
// Update last active using a fresh context. updateLastActive(h.db, sandboxID, sandboxIDStr)
updateCtx, updateCancel := context.WithTimeout(context.Background(), 5*time.Second)
defer updateCancel()
if err := h.db.UpdateLastActive(updateCtx, db.UpdateLastActiveParams{
ID: sandboxID,
LastActiveAt: pgtype.Timestamptz{
Time: time.Now(),
Valid: true,
},
}); err != nil {
slog.Warn("failed to update last active after process stream", "sandbox_id", sandboxIDStr, "error", err)
}
}
func sendProcessWSError(conn *websocket.Conn, msg string) {
writeWSJSON(conn, wsProcessOut{Type: "error", Data: msg})
} }

View File

@ -90,40 +90,9 @@ func (h *ptyHandler) PtySession(w http.ResponseWriter, r *http.Request) {
return return
} }
// API key auth is handled by middleware (sets context). conn, ac, err := upgradeAndAuthenticate(w, r, h.jwtSecret, h.db)
// For browser JWT auth, we authenticate after upgrade via first WS message.
ac, hasAuth := auth.FromContext(ctx)
if !hasAuth {
// No pre-upgrade auth — upgrade first, then authenticate via WS message.
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
slog.Error("pty websocket upgrade failed", "error", err)
return
}
defer conn.Close()
ws := &wsWriter{conn: conn}
var wsAC auth.AuthContext
if isAdminWSRoute(ctx) {
wsAC, err = wsAuthenticateAdmin(ctx, conn, h.jwtSecret, h.db)
} else {
wsAC, err = wsAuthenticate(ctx, conn, h.jwtSecret, h.db)
}
if err != nil {
ws.writeJSON(wsPtyOut{Type: "error", Data: "authentication failed", Fatal: true})
return
}
ac = wsAC
h.runPtySession(ctx, ws, conn, ac, sandboxID, sandboxIDStr)
return
}
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil { if err != nil {
slog.Error("pty websocket upgrade failed", "error", err) slog.Error("pty websocket upgrade/auth failed", "error", err)
return return
} }
defer conn.Close() defer conn.Close()
@ -168,18 +137,7 @@ func (h *ptyHandler) runPtySession(ctx context.Context, ws *wsWriter, conn *webs
ws.writeJSON(wsPtyOut{Type: "error", Data: "first message must be type 'start' or 'connect'", Fatal: true}) ws.writeJSON(wsPtyOut{Type: "error", Data: "first message must be type 'start' or 'connect'", Fatal: true})
} }
// Update last active using a fresh context. updateLastActive(h.db, sandboxID, sandboxIDStr)
updateCtx, updateCancel := context.WithTimeout(context.Background(), 5*time.Second)
defer updateCancel()
if err := h.db.UpdateLastActive(updateCtx, db.UpdateLastActiveParams{
ID: sandboxID,
LastActiveAt: pgtype.Timestamptz{
Time: time.Now(),
Valid: true,
},
}); err != nil {
slog.Warn("failed to update last active after pty session", "sandbox_id", sandboxIDStr, "error", err)
}
} }
func (h *ptyHandler) handleStart( func (h *ptyHandler) handleStart(

View File

@ -108,7 +108,7 @@ func (h *sandboxHandler) Create(w http.ResponseWriter, r *http.Request) {
} }
h.audit.LogSandboxCreate(r.Context(), ac, sb.ID, sb.Template) h.audit.LogSandboxCreate(r.Context(), ac, sb.ID, sb.Template)
writeJSON(w, http.StatusCreated, sandboxToResponse(sb)) writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
} }
// List handles GET /v1/capsules. // List handles GET /v1/capsules.
@ -167,7 +167,7 @@ func (h *sandboxHandler) Pause(w http.ResponseWriter, r *http.Request) {
} }
h.audit.LogSandboxPause(r.Context(), ac, sandboxID) h.audit.LogSandboxPause(r.Context(), ac, sandboxID)
writeJSON(w, http.StatusOK, sandboxToResponse(sb)) writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
} }
// Resume handles POST /v1/capsules/{id}/resume. // Resume handles POST /v1/capsules/{id}/resume.
@ -189,7 +189,7 @@ func (h *sandboxHandler) Resume(w http.ResponseWriter, r *http.Request) {
} }
h.audit.LogSandboxResume(r.Context(), ac, sandboxID) h.audit.LogSandboxResume(r.Context(), ac, sandboxID)
writeJSON(w, http.StatusOK, sandboxToResponse(sb)) writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
} }
// Ping handles POST /v1/capsules/{id}/ping. // Ping handles POST /v1/capsules/{id}/ping.
@ -230,5 +230,5 @@ func (h *sandboxHandler) Destroy(w http.ResponseWriter, r *http.Request) {
} }
h.audit.LogSandboxDestroy(r.Context(), ac, sandboxID) h.audit.LogSandboxDestroy(r.Context(), ac, sandboxID)
w.WriteHeader(http.StatusNoContent) w.WriteHeader(http.StatusAccepted)
} }

View File

@ -0,0 +1,65 @@
package api
import (
"encoding/json"
"net/http"
"time"
"github.com/redis/go-redis/v9"
"git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/id"
)
type sandboxEventHandler struct {
db *db.Queries
rdb *redis.Client
}
func newSandboxEventHandler(queries *db.Queries, rdb *redis.Client) *sandboxEventHandler {
return &sandboxEventHandler{db: queries, rdb: rdb}
}
type sandboxEventRequest struct {
Event string `json:"event"`
SandboxID string `json:"sandbox_id"`
HostID string `json:"host_id"`
Timestamp int64 `json:"timestamp"`
}
// Handle receives lifecycle event callbacks from host agents and publishes
// them to the internal Redis stream for the SandboxEventConsumer to process.
func (h *sandboxEventHandler) Handle(w http.ResponseWriter, r *http.Request) {
var req sandboxEventRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "invalid JSON body")
return
}
if req.Event == "" || req.SandboxID == "" || req.HostID == "" {
writeError(w, http.StatusBadRequest, "invalid_request", "event, sandbox_id, and host_id are required")
return
}
// Validate that the calling host matches the host_id in the payload.
hc := auth.MustHostFromContext(r.Context())
callerHostID := id.FormatHostID(hc.HostID)
if callerHostID != req.HostID {
writeError(w, http.StatusForbidden, "forbidden", "host_id does not match authenticated host")
return
}
if req.Timestamp == 0 {
req.Timestamp = time.Now().Unix()
}
PublishSandboxEvent(r.Context(), h.rdb, SandboxEvent{
Event: req.Event,
SandboxID: req.SandboxID,
HostID: req.HostID,
Timestamp: req.Timestamp,
})
w.WriteHeader(http.StatusNoContent)
}

View File

@ -19,6 +19,12 @@ import (
// it is considered unreachable (3 missed 30-second heartbeats). // it is considered unreachable (3 missed 30-second heartbeats).
const unreachableThreshold = 90 * time.Second const unreachableThreshold = 90 * time.Second
// transientGracePeriod is how long a sandbox is allowed to stay in a transient
// status (starting, resuming, pausing, stopping) before the monitor infers a
// final state. This prevents the monitor from racing against in-flight RPCs
// that may not have registered the sandbox on the host agent yet.
const transientGracePeriod = 2 * time.Minute
// HostMonitor runs on a fixed interval and performs two duties: // HostMonitor runs on a fixed interval and performs two duties:
// //
// 1. Passive check: marks hosts whose last_heartbeat_at is stale as // 1. Passive check: marks hosts whose last_heartbeat_at is stale as
@ -77,6 +83,21 @@ func (m *HostMonitor) run(ctx context.Context) {
} }
} }
// ReconcileHost triggers immediate active reconciliation for a single host.
// Called when a host transitions from unreachable → online so sandboxes marked
// "missing" are resolved without waiting for the next monitor tick.
func (m *HostMonitor) ReconcileHost(ctx context.Context, hostID pgtype.UUID) {
host, err := m.db.GetHost(ctx, hostID)
if err != nil {
slog.Warn("host monitor: reconcile-on-connect: failed to get host", "error", err)
return
}
if host.Status != "online" {
return
}
m.checkHost(ctx, host)
}
func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) { func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
// --- Passive phase: check heartbeat staleness --- // --- Passive phase: check heartbeat staleness ---
@ -213,4 +234,58 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
slog.Warn("host monitor: failed to mark stopped", "host_id", id.FormatHostID(host.ID), "error", err) slog.Warn("host monitor: failed to mark stopped", "host_id", id.FormatHostID(host.ID), "error", err)
} }
} }
// --- Reconcile transient statuses (starting, resuming, pausing, stopping) ---
// These represent in-flight operations. If the sandbox is no longer alive on
// the host, infer the final state based on the transient status.
transientSandboxes, err := m.db.ListSandboxesByHostAndStatus(ctx, db.ListSandboxesByHostAndStatusParams{
HostID: host.ID,
Column2: []string{"starting", "resuming", "pausing", "stopping"},
})
if err != nil {
slog.Warn("host monitor: failed to list transient sandboxes", "host_id", id.FormatHostID(host.ID), "error", err)
return
}
for _, sb := range transientSandboxes {
sbIDStr := id.FormatSandboxID(sb.ID)
if _, ok := alive[sbIDStr]; ok {
// Sandbox is alive on host — the background goroutine should
// finalize the transition. For starting/resuming, if the sandbox
// is alive it means creation/resume succeeded.
if sb.Status == "starting" || sb.Status == "resuming" {
if _, err := m.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sb.ID, Status: sb.Status, Status_2: "running",
}); err == nil {
slog.Info("host monitor: promoted transient sandbox to running", "sandbox_id", sbIDStr, "from", sb.Status)
}
}
continue
}
// Sandbox is not alive on host. If the transition is recent, give the
// in-flight RPC time to finish before declaring a final state.
if sb.LastUpdated.Valid && time.Since(sb.LastUpdated.Time) < transientGracePeriod {
slog.Debug("host monitor: transient sandbox still within grace period",
"sandbox_id", sbIDStr, "status", sb.Status,
"age", time.Since(sb.LastUpdated.Time).Round(time.Second))
continue
}
// Grace period expired — infer final state.
var finalStatus string
switch sb.Status {
case "starting", "resuming":
finalStatus = "error"
case "pausing":
finalStatus = "paused"
case "stopping":
finalStatus = "stopped"
}
if _, err := m.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sb.ID, Status: sb.Status, Status_2: finalStatus,
}); err == nil {
slog.Info("host monitor: resolved transient sandbox", "sandbox_id", sbIDStr, "from", sb.Status, "to", finalStatus)
}
}
} }

View File

@ -1,8 +1,8 @@
openapi: "3.1.0" openapi: "3.1.0"
info: info:
title: Wrenn API title: Wrenn API
description: MicroVM-based code execution platform API. description: AI agent execution platform API.
version: "0.1.4" version: "0.2.0"
servers: servers:
- url: http://localhost:8080 - url: http://localhost:8080
@ -866,8 +866,8 @@ paths:
schema: schema:
$ref: "#/components/schemas/CreateCapsuleRequest" $ref: "#/components/schemas/CreateCapsuleRequest"
responses: responses:
"201": "202":
description: Capsule created description: Capsule creation initiated (status will be "starting")
content: content:
application/json: application/json:
schema: schema:
@ -988,8 +988,8 @@ paths:
security: security:
- apiKeyAuth: [] - apiKeyAuth: []
responses: responses:
"204": "202":
description: Capsule destroyed description: Capsule destruction initiated
/v1/capsules/{id}/exec: /v1/capsules/{id}/exec:
parameters: parameters:
@ -1260,8 +1260,8 @@ paths:
destroys all running resources. The capsule exists only as files on destroys all running resources. The capsule exists only as files on
disk and can be resumed later. disk and can be resumed later.
responses: responses:
"200": "202":
description: Capsule paused (snapshot taken, resources released) description: Capsule pause initiated (status will be "pausing")
content: content:
application/json: application/json:
schema: schema:
@ -1289,11 +1289,11 @@ paths:
- apiKeyAuth: [] - apiKeyAuth: []
description: | description: |
Restores a paused capsule from its snapshot using UFFD for lazy Restores a paused capsule from its snapshot using UFFD for lazy
memory loading. Boots a fresh Firecracker process, sets up a new memory loading. Boots a fresh Cloud Hypervisor process, sets up a new
network slot, and waits for envd to become ready. network slot, and waits for envd to become ready.
responses: responses:
"200": "202":
description: Capsule resumed (new VM booted from snapshot) description: Capsule resume initiated (status will be "resuming")
content: content:
application/json: application/json:
schema: schema:
@ -2035,6 +2035,51 @@ paths:
schema: schema:
$ref: "#/components/schemas/Error" $ref: "#/components/schemas/Error"
/v1/hosts/sandbox-events:
post:
summary: Sandbox lifecycle event callback
operationId: sandboxEventCallback
tags: [hosts]
security:
- hostTokenAuth: []
description: |
Receives autonomous lifecycle events from host agents (e.g. auto-pause
from the TTL reaper). The event is published to an internal Redis stream
for the control plane's event consumer to process.
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [event, sandbox_id, host_id]
properties:
event:
type: string
enum: [sandbox.auto_paused]
sandbox_id:
type: string
host_id:
type: string
timestamp:
type: integer
format: int64
responses:
"204":
description: Event accepted
"400":
description: Invalid request
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
"403":
description: Host ID mismatch
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
/v1/hosts/auth/refresh: /v1/hosts/auth/refresh:
post: post:
summary: Refresh host JWT summary: Refresh host JWT
@ -2395,6 +2440,14 @@ paths:
$ref: "#/components/schemas/Error" $ref: "#/components/schemas/Error"
components: components:
responses:
BadRequest:
description: Invalid request parameters
content:
application/json:
schema:
$ref: "#/components/schemas/Error"
securitySchemes: securitySchemes:
apiKeyAuth: apiKeyAuth:
type: apiKey type: apiKey
@ -2592,7 +2645,7 @@ components:
type: string type: string
status: status:
type: string type: string
enum: [pending, starting, running, paused, hibernated, stopped, missing, error] enum: [pending, starting, running, pausing, paused, resuming, stopping, hibernated, stopped, missing, error]
template: template:
type: string type: string
vcpus: vcpus:
@ -3059,7 +3112,7 @@ components:
mem_bytes: mem_bytes:
type: integer type: integer
format: int64 format: int64
description: "Resident memory in bytes (VmRSS of Firecracker process)" description: "Resident memory in bytes (VmRSS of Cloud Hypervisor process)"
disk_bytes: disk_bytes:
type: integer type: integer
format: int64 format: int64

View File

@ -0,0 +1,254 @@
package api
import (
"context"
"encoding/json"
"errors"
"log/slog"
"time"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgtype"
"github.com/redis/go-redis/v9"
"git.omukk.dev/wrenn/wrenn/pkg/audit"
"git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/id"
)
const (
sandboxEventStream = "wrenn:sandbox-events"
sandboxEventGroup = "wrenn-sandbox-events-v1"
sandboxEventConsumer = "cp-0"
)
// SandboxEvent is the canonical event payload published to the Redis stream
// by both the CP background goroutines (for explicit lifecycle ops) and
// the agent callback endpoint (for autonomous events like auto-pause).
type SandboxEvent struct {
Event string `json:"event"`
SandboxID string `json:"sandbox_id"`
HostID string `json:"host_id"`
HostIP string `json:"host_ip,omitempty"`
Metadata map[string]string `json:"metadata,omitempty"`
Error string `json:"error,omitempty"`
Timestamp int64 `json:"timestamp"`
}
// Sandbox event type constants.
const (
SandboxEventStarted = "sandbox.started"
SandboxEventPaused = "sandbox.paused"
SandboxEventResumed = "sandbox.resumed"
SandboxEventStopped = "sandbox.stopped"
SandboxEventFailed = "sandbox.failed"
SandboxEventError = "sandbox.error"
SandboxEventAutoPaused = "sandbox.auto_paused"
)
// SandboxEventConsumer reads sandbox lifecycle events from the Redis stream
// and updates database state accordingly. It follows the same XREADGROUP
// pattern as pkg/channels/dispatcher.go.
type SandboxEventConsumer struct {
rdb *redis.Client
db *db.Queries
audit *audit.AuditLogger
}
// NewSandboxEventConsumer creates a consumer.
func NewSandboxEventConsumer(rdb *redis.Client, queries *db.Queries, al *audit.AuditLogger) *SandboxEventConsumer {
return &SandboxEventConsumer{rdb: rdb, db: queries, audit: al}
}
// Start launches the consumer goroutine.
func (c *SandboxEventConsumer) Start(ctx context.Context) {
go c.run(ctx)
}
func (c *SandboxEventConsumer) run(ctx context.Context) {
err := c.rdb.XGroupCreateMkStream(ctx, sandboxEventStream, sandboxEventGroup, "$").Err()
if err != nil && err.Error() != "BUSYGROUP Consumer Group name already exists" {
slog.Error("sandbox event consumer: failed to create consumer group", "error", err)
return
}
for {
select {
case <-ctx.Done():
return
default:
}
streams, err := c.rdb.XReadGroup(ctx, &redis.XReadGroupArgs{
Group: sandboxEventGroup,
Consumer: sandboxEventConsumer,
Streams: []string{sandboxEventStream, ">"},
Count: 10,
Block: 5 * time.Second,
}).Result()
if err != nil {
if err == redis.Nil || ctx.Err() != nil {
continue
}
slog.Warn("sandbox event consumer: xreadgroup error", "error", err)
time.Sleep(1 * time.Second)
continue
}
for _, stream := range streams {
for _, msg := range stream.Messages {
c.handleMessage(ctx, msg)
}
}
}
}
func (c *SandboxEventConsumer) handleMessage(ctx context.Context, msg redis.XMessage) {
// Use a non-cancellable context for XAck so shutdown doesn't leave
// messages permanently stuck in the pending entries list.
defer func() {
ackCtx, ackCancel := context.WithTimeout(context.Background(), 5*time.Second)
defer ackCancel()
if err := c.rdb.XAck(ackCtx, sandboxEventStream, sandboxEventGroup, msg.ID).Err(); err != nil {
slog.Warn("sandbox event consumer: xack failed", "id", msg.ID, "error", err)
}
}()
payload, ok := msg.Values["payload"].(string)
if !ok {
slog.Warn("sandbox event consumer: message missing payload", "id", msg.ID)
return
}
var event SandboxEvent
if err := json.Unmarshal([]byte(payload), &event); err != nil {
slog.Warn("sandbox event consumer: failed to unmarshal event", "id", msg.ID, "error", err)
return
}
sandboxID, err := id.ParseSandboxID(event.SandboxID)
if err != nil {
slog.Warn("sandbox event consumer: invalid sandbox ID", "sandbox_id", event.SandboxID, "error", err)
return
}
switch event.Event {
case SandboxEventStarted:
c.handleStarted(ctx, sandboxID, event, "starting")
case SandboxEventResumed:
c.handleStarted(ctx, sandboxID, event, "resuming")
case SandboxEventPaused:
c.handlePaused(ctx, sandboxID, event)
case SandboxEventStopped:
c.handleStopped(ctx, sandboxID, event)
case SandboxEventFailed, SandboxEventError:
c.handleFailed(ctx, sandboxID)
case SandboxEventAutoPaused:
c.handleAutoPaused(ctx, sandboxID, event)
default:
slog.Warn("sandbox event consumer: unknown event type", "event", event.Event)
}
}
// handleStarted is a fallback writer for sandbox.started and sandbox.resumed
// events. The background goroutine in SandboxService is the primary writer;
// this only succeeds if the goroutine's conditional update was missed.
func (c *SandboxEventConsumer) handleStarted(ctx context.Context, sandboxID pgtype.UUID, event SandboxEvent, fromStatus string) {
now := time.Now()
if _, err := c.db.UpdateSandboxRunningIf(ctx, db.UpdateSandboxRunningIfParams{
ID: sandboxID,
Status: fromStatus,
HostIp: event.HostIP,
StartedAt: pgtype.Timestamptz{
Time: now,
Valid: true,
},
}); err != nil {
return
}
if len(event.Metadata) > 0 {
metaJSON, _ := json.Marshal(event.Metadata)
_ = c.db.UpdateSandboxMetadata(ctx, db.UpdateSandboxMetadataParams{
ID: sandboxID,
Metadata: metaJSON,
})
}
}
func (c *SandboxEventConsumer) handlePaused(ctx context.Context, sandboxID pgtype.UUID, event SandboxEvent) {
if _, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID,
Status: "pausing",
Status_2: "paused",
}); err != nil && !errors.Is(err, pgx.ErrNoRows) {
slog.Warn("sandbox event consumer: failed to update sandbox to paused", "sandbox_id", event.SandboxID, "error", err)
}
}
func (c *SandboxEventConsumer) handleStopped(ctx context.Context, sandboxID pgtype.UUID, event SandboxEvent) {
// Try stopping → stopped (CP-initiated destroy completed).
if _, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID,
Status: "stopping",
Status_2: "stopped",
}); err == nil {
return
}
// Try running → stopped (autonomous destroy, e.g. TTL auto-destroy).
if _, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID,
Status: "running",
Status_2: "stopped",
}); err != nil && !errors.Is(err, pgx.ErrNoRows) {
slog.Warn("sandbox event consumer: failed to update sandbox to stopped", "sandbox_id", event.SandboxID, "error", err)
}
}
// handleFailed marks a sandbox as "error" when the host agent reports a crash
// or the CP's background goroutine publishes a failure. Uses conditional update
// to avoid clobbering concurrent operations.
func (c *SandboxEventConsumer) handleFailed(ctx context.Context, sandboxID pgtype.UUID) {
// Try each possible pre-failure state until one matches.
for _, fromStatus := range []string{"running", "starting", "pausing", "resuming"} {
if _, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: fromStatus, Status_2: "error",
}); err == nil {
return
}
}
}
func (c *SandboxEventConsumer) handleAutoPaused(ctx context.Context, sandboxID pgtype.UUID, _ SandboxEvent) {
sb, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID,
Status: "running",
Status_2: "paused",
})
if err != nil {
return
}
c.audit.LogSandboxAutoPause(ctx, sb.TeamID, sandboxID)
}
// PublishSandboxEvent writes a sandbox lifecycle event to the Redis stream.
// Used by both the SandboxService background goroutines and the callback endpoint.
func PublishSandboxEvent(ctx context.Context, rdb *redis.Client, event SandboxEvent) {
payload, err := json.Marshal(event)
if err != nil {
slog.Warn("sandbox event: failed to marshal", "event", event.Event, "error", err)
return
}
if err := rdb.XAdd(ctx, &redis.XAddArgs{
Stream: sandboxEventStream,
MaxLen: 50000,
Approx: true,
Values: map[string]any{
"payload": string(payload),
},
}).Err(); err != nil {
slog.Warn("sandbox event: failed to publish", "event", event.Event, "error", err)
}
}

View File

@ -1,6 +1,7 @@
package api package api
import ( import (
"context"
_ "embed" _ "embed"
"fmt" "fmt"
"net/http" "net/http"
@ -49,6 +50,7 @@ func New(
mailer email.Mailer, mailer email.Mailer,
extensions []cpextension.Extension, extensions []cpextension.Extension,
sctx cpextension.ServerContext, sctx cpextension.ServerContext,
monitor *HostMonitor,
version string, version string,
) *Server { ) *Server {
r := chi.NewRouter() r := chi.NewRouter()
@ -63,6 +65,17 @@ func New(
// Shared service layer. // Shared service layer.
sandboxSvc := &service.SandboxService{DB: queries, Pool: pool, Scheduler: sched} sandboxSvc := &service.SandboxService{DB: queries, Pool: pool, Scheduler: sched}
sandboxSvc.PublishEvent = func(ctx context.Context, event service.SandboxStateEvent) {
PublishSandboxEvent(ctx, rdb, SandboxEvent{
Event: event.Event,
SandboxID: event.SandboxID,
HostID: event.HostID,
HostIP: event.HostIP,
Metadata: event.Metadata,
Error: event.Error,
Timestamp: event.Timestamp,
})
}
apiKeySvc := &service.APIKeyService{DB: queries} apiKeySvc := &service.APIKeyService{DB: queries}
templateSvc := &service.TemplateService{DB: queries} templateSvc := &service.TemplateService{DB: queries}
hostSvc := &service.HostService{DB: queries, Redis: rdb, JWT: jwtSecret, Pool: pool, CA: ca} hostSvc := &service.HostService{DB: queries, Redis: rdb, JWT: jwtSecret, Pool: pool, CA: ca}
@ -83,7 +96,7 @@ func New(
authH := newAuthHandler(queries, pgPool, jwtSecret, mailer, rdb, oauthRedirectURL) authH := newAuthHandler(queries, pgPool, jwtSecret, mailer, rdb, oauthRedirectURL)
oauthH := newOAuthHandler(queries, pgPool, jwtSecret, oauthRegistry, oauthRedirectURL) oauthH := newOAuthHandler(queries, pgPool, jwtSecret, oauthRegistry, oauthRedirectURL)
apiKeys := newAPIKeyHandler(apiKeySvc, al) apiKeys := newAPIKeyHandler(apiKeySvc, al)
hostH := newHostHandler(hostSvc, queries, al) hostH := newHostHandler(hostSvc, queries, al, monitor)
teamH := newTeamHandler(teamSvc, al, mailer) teamH := newTeamHandler(teamSvc, al, mailer)
usersH := newUsersHandler(queries, userSvc, al) usersH := newUsersHandler(queries, userSvc, al)
auditH := newAuditHandler(auditSvc) auditH := newAuditHandler(auditSvc)
@ -95,6 +108,7 @@ func New(
ptyH := newPtyHandler(queries, pool, jwtSecret) ptyH := newPtyHandler(queries, pool, jwtSecret)
processH := newProcessHandler(queries, pool, jwtSecret) processH := newProcessHandler(queries, pool, jwtSecret)
adminCapsules := newAdminCapsuleHandler(sandboxSvc, queries, pool, al) adminCapsules := newAdminCapsuleHandler(sandboxSvc, queries, pool, al)
sandboxEvtH := newSandboxEventHandler(queries, rdb)
meH := newMeHandler(queries, pgPool, rdb, jwtSecret, mailer, oauthRegistry, oauthRedirectURL, teamSvc) meH := newMeHandler(queries, pgPool, rdb, jwtSecret, mailer, oauthRegistry, oauthRedirectURL, teamSvc)
// Health check. // Health check.
@ -221,8 +235,9 @@ func New(
// Unauthenticated: refresh token exchange. // Unauthenticated: refresh token exchange.
r.Post("/auth/refresh", hostH.RefreshToken) r.Post("/auth/refresh", hostH.RefreshToken)
// Host-token-authenticated: heartbeat. // Host-token-authenticated: heartbeat and lifecycle callbacks.
r.With(requireHostToken(jwtSecret)).Post("/{id}/heartbeat", hostH.Heartbeat) r.With(requireHostToken(jwtSecret)).Post("/{id}/heartbeat", hostH.Heartbeat)
r.With(requireHostToken(jwtSecret)).Post("/sandbox-events", sandboxEvtH.Handle)
// JWT-authenticated: host CRUD and tags. // JWT-authenticated: host CRUD and tags.
r.Group(func(r chi.Router) { r.Group(func(r chi.Router) {

View File

@ -80,8 +80,8 @@ func (r *LoopRegistry) Release(imagePath string) {
e.refcount-- e.refcount--
if e.refcount <= 0 { if e.refcount <= 0 {
if err := losetupDetach(e.device); err != nil { if err := losetupDetachRetry(e.device); err != nil {
slog.Warn("losetup detach failed", "device", e.device, "error", err) slog.Error("losetup detach failed, loop device leaked", "device", e.device, "image", imagePath, "error", err)
} }
delete(r.entries, imagePath) delete(r.entries, imagePath)
slog.Info("loop device released", "image", imagePath, "device", e.device) slog.Info("loop device released", "image", imagePath, "device", e.device)
@ -94,8 +94,8 @@ func (r *LoopRegistry) ReleaseAll() {
defer r.mu.Unlock() defer r.mu.Unlock()
for path, e := range r.entries { for path, e := range r.entries {
if err := losetupDetach(e.device); err != nil { if err := losetupDetachRetry(e.device); err != nil {
slog.Warn("losetup detach failed", "device", e.device, "error", err) slog.Error("losetup detach failed during shutdown", "device", e.device, "image", path, "error", err)
} }
delete(r.entries, path) delete(r.entries, path)
} }
@ -109,6 +109,31 @@ type SnapshotDevice struct {
CowLoopDev string // loop device for the CoW file CowLoopDev string // loop device for the CoW file
} }
// attachCowAndCreate attaches a CoW file as a loop device, creates the
// dm-snapshot target, and returns the assembled SnapshotDevice. On failure
// it detaches the CoW loop device before returning.
func attachCowAndCreate(name, originLoopDev, cowPath string, originSizeBytes int64) (*SnapshotDevice, error) {
cowLoopDev, err := losetupCreateRW(cowPath)
if err != nil {
return nil, fmt.Errorf("losetup cow: %w", err)
}
sectors := originSizeBytes / 512
if err := dmsetupCreate(name, originLoopDev, cowLoopDev, sectors); err != nil {
if detachErr := losetupDetachRetry(cowLoopDev); detachErr != nil {
slog.Error("cow losetup detach failed during cleanup, loop device leaked", "device", cowLoopDev, "error", detachErr)
}
return nil, fmt.Errorf("dmsetup create: %w", err)
}
return &SnapshotDevice{
Name: name,
DevicePath: "/dev/mapper/" + name,
CowPath: cowPath,
CowLoopDev: cowLoopDev,
}, nil
}
// CreateSnapshot sets up a new dm-snapshot device. // CreateSnapshot sets up a new dm-snapshot device.
// //
// It creates a sparse CoW file, attaches it as a loop device, and creates // It creates a sparse CoW file, attaches it as a loop device, and creates
@ -117,45 +142,24 @@ type SnapshotDevice struct {
// //
// The origin loop device must already exist (from LoopRegistry.Acquire). // The origin loop device must already exist (from LoopRegistry.Acquire).
func CreateSnapshot(name, originLoopDev, cowPath string, originSizeBytes, cowSizeBytes int64) (*SnapshotDevice, error) { func CreateSnapshot(name, originLoopDev, cowPath string, originSizeBytes, cowSizeBytes int64) (*SnapshotDevice, error) {
// Create sparse CoW file. The logical size limits how many blocks can be
// modified; because the file is sparse, only written blocks use real disk.
if err := createSparseFile(cowPath, cowSizeBytes); err != nil { if err := createSparseFile(cowPath, cowSizeBytes); err != nil {
return nil, fmt.Errorf("create cow file: %w", err) return nil, fmt.Errorf("create cow file: %w", err)
} }
cowLoopDev, err := losetupCreateRW(cowPath) dev, err := attachCowAndCreate(name, originLoopDev, cowPath, originSizeBytes)
if err != nil { if err != nil {
os.Remove(cowPath) os.Remove(cowPath)
return nil, fmt.Errorf("losetup cow: %w", err) return nil, err
} }
// The dm-snapshot virtual device size must match the origin — the snapshot
// target maps 1:1 onto origin sectors. The CoW file just needs enough
// space to store all modified blocks (it's sparse, so 20GB costs nothing).
sectors := originSizeBytes / 512
if err := dmsetupCreate(name, originLoopDev, cowLoopDev, sectors); err != nil {
if detachErr := losetupDetach(cowLoopDev); detachErr != nil {
slog.Warn("cow losetup detach failed during cleanup", "device", cowLoopDev, "error", detachErr)
}
os.Remove(cowPath)
return nil, fmt.Errorf("dmsetup create: %w", err)
}
devPath := "/dev/mapper/" + name
slog.Info("dm-snapshot created", slog.Info("dm-snapshot created",
"name", name, "name", name,
"device", devPath, "device", dev.DevicePath,
"origin", originLoopDev, "origin", originLoopDev,
"cow", cowPath, "cow", cowPath,
) )
return &SnapshotDevice{ return dev, nil
Name: name,
DevicePath: devPath,
CowPath: cowPath,
CowLoopDev: cowLoopDev,
}, nil
} }
// RestoreSnapshot re-attaches a dm-snapshot from an existing persistent CoW file. // RestoreSnapshot re-attaches a dm-snapshot from an existing persistent CoW file.
@ -171,34 +175,19 @@ func RestoreSnapshot(ctx context.Context, name, originLoopDev, cowPath string, o
} }
} }
cowLoopDev, err := losetupCreateRW(cowPath) dev, err := attachCowAndCreate(name, originLoopDev, cowPath, originSizeBytes)
if err != nil { if err != nil {
return nil, fmt.Errorf("losetup cow: %w", err) return nil, err
} }
sectors := originSizeBytes / 512
if err := dmsetupCreate(name, originLoopDev, cowLoopDev, sectors); err != nil {
if detachErr := losetupDetach(cowLoopDev); detachErr != nil {
slog.Warn("cow losetup detach failed during cleanup", "device", cowLoopDev, "error", detachErr)
}
return nil, fmt.Errorf("dmsetup create: %w", err)
}
devPath := "/dev/mapper/" + name
slog.Info("dm-snapshot restored", slog.Info("dm-snapshot restored",
"name", name, "name", name,
"device", devPath, "device", dev.DevicePath,
"origin", originLoopDev, "origin", originLoopDev,
"cow", cowPath, "cow", cowPath,
) )
return &SnapshotDevice{ return dev, nil
Name: name,
DevicePath: devPath,
CowPath: cowPath,
CowLoopDev: cowLoopDev,
}, nil
} }
// RemoveSnapshot tears down a dm-snapshot device and its CoW loop device. // RemoveSnapshot tears down a dm-snapshot device and its CoW loop device.
@ -208,8 +197,8 @@ func RemoveSnapshot(ctx context.Context, dev *SnapshotDevice) error {
return fmt.Errorf("dmsetup remove %s: %w", dev.Name, err) return fmt.Errorf("dmsetup remove %s: %w", dev.Name, err)
} }
if err := losetupDetach(dev.CowLoopDev); err != nil { if err := losetupDetachRetry(dev.CowLoopDev); err != nil {
slog.Warn("cow losetup detach failed", "device", dev.CowLoopDev, "error", err) return fmt.Errorf("detach cow loop %s: %w", dev.CowLoopDev, err)
} }
slog.Info("dm-snapshot removed", "name", dev.Name) slog.Info("dm-snapshot removed", "name", dev.Name)
@ -297,6 +286,24 @@ func losetupDetach(dev string) error {
return exec.Command("losetup", "-d", dev).Run() return exec.Command("losetup", "-d", dev).Run()
} }
// losetupDetachRetry detaches a loop device with retries for transient
// "device busy" errors (kernel may still hold references briefly after
// dm-snapshot removal).
func losetupDetachRetry(dev string) error {
var lastErr error
for attempt := range 5 {
if attempt > 0 {
time.Sleep(200 * time.Millisecond)
}
if err := losetupDetach(dev); err == nil {
return nil
} else {
lastErr = err
}
}
return fmt.Errorf("after 5 attempts: %w", lastErr)
}
// dmsetupCreate creates a dm-snapshot device with persistent metadata. // dmsetupCreate creates a dm-snapshot device with persistent metadata.
func dmsetupCreate(name, originDev, cowDev string, sectors int64) error { func dmsetupCreate(name, originDev, cowDev string, sectors int64) error {
// Table format: <start> <size> snapshot <origin> <cow> P <chunk_size> // Table format: <start> <size> snapshot <origin> <cow> P <chunk_size>
@ -316,7 +323,7 @@ func dmDeviceExists(name string) bool {
// dmsetupRemove removes a device-mapper device, retrying on transient // dmsetupRemove removes a device-mapper device, retrying on transient
// "device busy" errors that occur when the kernel hasn't fully released // "device busy" errors that occur when the kernel hasn't fully released
// the device after a Firecracker process exits. // the device after a VMM process exits.
func dmsetupRemove(ctx context.Context, name string) error { func dmsetupRemove(ctx context.Context, name string) error {
var lastErr error var lastErr error
for attempt := range 5 { for attempt := range 5 {
@ -361,5 +368,9 @@ func createSparseFile(path string, sizeBytes int64) error {
os.Remove(path) os.Remove(path)
return err return err
} }
return f.Close() if err := f.Close(); err != nil {
os.Remove(path)
return err
}
return nil
} }

View File

@ -78,15 +78,30 @@ type ExecResult struct {
ExitCode int32 ExitCode int32
} }
// ExecOpts holds optional parameters for Exec.
type ExecOpts struct {
Envs map[string]string
Cwd string
}
// Exec runs a command inside the sandbox and collects all stdout/stderr output. // Exec runs a command inside the sandbox and collects all stdout/stderr output.
// It blocks until the command completes. // It blocks until the command completes.
func (c *Client) Exec(ctx context.Context, cmd string, args ...string) (*ExecResult, error) { func (c *Client) Exec(ctx context.Context, cmd string, args []string, opts *ExecOpts) (*ExecResult, error) {
stdin := false stdin := false
proc := &envdpb.ProcessConfig{
Cmd: cmd,
Args: args,
}
if opts != nil {
if len(opts.Envs) > 0 {
proc.Envs = opts.Envs
}
if opts.Cwd != "" {
proc.Cwd = &opts.Cwd
}
}
req := connect.NewRequest(&envdpb.StartRequest{ req := connect.NewRequest(&envdpb.StartRequest{
Process: &envdpb.ProcessConfig{ Process: proc,
Cmd: cmd,
Args: args,
},
Stdin: &stdin, Stdin: &stdin,
}) })
@ -294,7 +309,7 @@ func (c *Client) ReadFile(ctx context.Context, path string) ([]byte, error) {
// PrepareSnapshot calls envd's POST /snapshot/prepare endpoint, which stops // PrepareSnapshot calls envd's POST /snapshot/prepare endpoint, which stops
// the port scanner/forwarder and marks active connections for post-restore // the port scanner/forwarder and marks active connections for post-restore
// cleanup before Firecracker freezes vCPUs. // cleanup before the VMM freezes vCPUs.
// //
// Best-effort: the caller should log a warning on error but not abort the pause. // Best-effort: the caller should log a warning on error but not abort the pause.
func (c *Client) PrepareSnapshot(ctx context.Context) error { func (c *Client) PrepareSnapshot(ctx context.Context) error {
@ -317,27 +332,33 @@ func (c *Client) PrepareSnapshot(ctx context.Context) error {
return nil return nil
} }
// PostInit calls envd's POST /init endpoint, which triggers a re-read of // PostInit calls envd's POST /init endpoint to trigger post-boot or
// Firecracker MMDS metadata. This updates WRENN_SANDBOX_ID, WRENN_TEMPLATE_ID // post-restore initialization. sandbox_id and template_id are passed
// env vars and the corresponding files under /run/wrenn/ inside the guest. // so envd can set WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID env vars.
// Must be called after snapshot restore so envd picks up the new sandbox's metadata.
func (c *Client) PostInit(ctx context.Context) error { func (c *Client) PostInit(ctx context.Context) error {
return c.PostInitWithDefaults(ctx, "", nil) return c.PostInitWithDefaults(ctx, "", nil, "", "")
} }
// PostInitWithDefaults calls envd's POST /init endpoint with optional default // PostInitWithDefaults calls envd's POST /init endpoint with optional default
// user and environment variables. These are applied to envd's defaults so all // user, environment variables, and sandbox metadata. These are applied to
// subsequent process executions use them. // envd's defaults so all subsequent process executions use them.
func (c *Client) PostInitWithDefaults(ctx context.Context, defaultUser string, envVars map[string]string) error { func (c *Client) PostInitWithDefaults(ctx context.Context, defaultUser string, envVars map[string]string, sandboxID, templateID string) error {
payload := make(map[string]any)
if defaultUser != "" {
payload["defaultUser"] = defaultUser
}
if len(envVars) > 0 {
payload["envVars"] = envVars
}
if sandboxID != "" {
payload["sandbox_id"] = sandboxID
}
if templateID != "" {
payload["template_id"] = templateID
}
var body io.Reader var body io.Reader
if defaultUser != "" || len(envVars) > 0 { if len(payload) > 0 {
payload := make(map[string]any)
if defaultUser != "" {
payload["defaultUser"] = defaultUser
}
if len(envVars) > 0 {
payload["envVars"] = envVars
}
data, err := json.Marshal(payload) data, err := json.Marshal(payload)
if err != nil { if err != nil {
return fmt.Errorf("marshal init body: %w", err) return fmt.Errorf("marshal init body: %w", err)

View File

@ -0,0 +1,129 @@
package hostagent
import (
"bytes"
"context"
"encoding/json"
"fmt"
"log/slog"
"net/http"
"strings"
"sync"
"time"
)
// CallbackEvent is the payload sent to the CP's sandbox event callback endpoint.
type CallbackEvent struct {
Event string `json:"event"`
SandboxID string `json:"sandbox_id"`
HostID string `json:"host_id"`
Timestamp int64 `json:"timestamp"`
}
// CallbackSender sends sandbox lifecycle events to the CP via HTTP POST.
// Used for autonomous agent-side events (auto-pause, auto-destroy) that
// the CP cannot observe through its own RPC goroutines.
type CallbackSender struct {
cpURL string
hostID string
credFile string
client *http.Client
mu sync.RWMutex
jwt string
}
// NewCallbackSender creates a callback sender.
func NewCallbackSender(cpURL, credFile, hostID string) *CallbackSender {
jwt := ""
if tf, err := LoadTokenFile(credFile); err == nil {
jwt = tf.JWT
}
return &CallbackSender{
cpURL: strings.TrimRight(cpURL, "/"),
hostID: hostID,
credFile: credFile,
client: &http.Client{Timeout: 10 * time.Second},
jwt: jwt,
}
}
// UpdateJWT refreshes the JWT used for callback authentication.
// Called from the heartbeat's onCredsRefreshed hook.
func (s *CallbackSender) UpdateJWT(jwt string) {
s.mu.Lock()
s.jwt = jwt
s.mu.Unlock()
}
func (s *CallbackSender) getJWT() string {
s.mu.RLock()
defer s.mu.RUnlock()
return s.jwt
}
// Send sends a callback event to the CP synchronously with retries.
func (s *CallbackSender) Send(ctx context.Context, ev CallbackEvent) error {
ev.HostID = s.hostID
if ev.Timestamp == 0 {
ev.Timestamp = time.Now().Unix()
}
body, err := json.Marshal(ev)
if err != nil {
return fmt.Errorf("marshal callback event: %w", err)
}
url := s.cpURL + "/v1/hosts/sandbox-events"
var lastErr error
for attempt := 0; attempt < 3; attempt++ {
if attempt > 0 {
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(time.Duration(attempt) * 500 * time.Millisecond):
}
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(body))
if err != nil {
return fmt.Errorf("create callback request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Host-Token", s.getJWT())
resp, err := s.client.Do(req)
if err != nil {
lastErr = err
continue
}
resp.Body.Close()
if resp.StatusCode == http.StatusUnauthorized || resp.StatusCode == http.StatusForbidden {
if newCreds, refreshErr := RefreshCredentials(ctx, s.cpURL, s.credFile); refreshErr == nil {
s.UpdateJWT(newCreds.JWT)
}
lastErr = fmt.Errorf("callback auth failed: %d", resp.StatusCode)
continue
}
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
return nil
}
lastErr = fmt.Errorf("callback failed: status %d", resp.StatusCode)
}
return fmt.Errorf("callback failed after 3 attempts: %w", lastErr)
}
// SendAsync sends a callback event in a background goroutine.
func (s *CallbackSender) SendAsync(ev CallbackEvent) {
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := s.Send(ctx, ev); err != nil {
slog.Warn("callback send failed (reconciler will catch it)", "event", ev.Event, "sandbox_id", ev.SandboxID, "error", err)
}
}()
}

View File

@ -0,0 +1,22 @@
package hostagent
import (
"git.omukk.dev/wrenn/wrenn/internal/sandbox"
)
// callbackAdapter adapts CallbackSender to satisfy sandbox.EventSender.
type callbackAdapter struct {
sender *CallbackSender
}
// NewEventSender wraps a CallbackSender as a sandbox.EventSender.
func NewEventSender(sender *CallbackSender) sandbox.EventSender {
return &callbackAdapter{sender: sender}
}
func (a *callbackAdapter) SendAsync(event sandbox.LifecycleEvent) {
a.sender.SendAsync(CallbackEvent{
Event: event.Event,
SandboxID: event.SandboxID,
})
}

View File

@ -2,6 +2,7 @@ package hostagent
import ( import (
"context" "context"
"errors"
"fmt" "fmt"
"io" "io"
"log/slog" "log/slog"
@ -19,6 +20,7 @@ import (
pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen" pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen"
"git.omukk.dev/wrenn/wrenn/proto/hostagent/gen/hostagentv1connect" "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen/hostagentv1connect"
"git.omukk.dev/wrenn/wrenn/internal/envdclient"
"git.omukk.dev/wrenn/wrenn/internal/sandbox" "git.omukk.dev/wrenn/wrenn/internal/sandbox"
) )
@ -89,7 +91,10 @@ func (s *Server) DestroySandbox(
req *connect.Request[pb.DestroySandboxRequest], req *connect.Request[pb.DestroySandboxRequest],
) (*connect.Response[pb.DestroySandboxResponse], error) { ) (*connect.Response[pb.DestroySandboxResponse], error) {
if err := s.mgr.Destroy(ctx, req.Msg.SandboxId); err != nil { if err := s.mgr.Destroy(ctx, req.Msg.SandboxId); err != nil {
return nil, connect.NewError(connect.CodeNotFound, err) if strings.Contains(err.Error(), "not found") {
return nil, connect.NewError(connect.CodeNotFound, err)
}
return nil, connect.NewError(connect.CodeInternal, err)
} }
return connect.NewResponse(&pb.DestroySandboxResponse{}), nil return connect.NewResponse(&pb.DestroySandboxResponse{}), nil
} }
@ -193,7 +198,7 @@ func (s *Server) PingSandbox(
req *connect.Request[pb.PingSandboxRequest], req *connect.Request[pb.PingSandboxRequest],
) (*connect.Response[pb.PingSandboxResponse], error) { ) (*connect.Response[pb.PingSandboxResponse], error) {
if err := s.mgr.Ping(req.Msg.SandboxId); err != nil { if err := s.mgr.Ping(req.Msg.SandboxId); err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return nil, connect.NewError(connect.CodeNotFound, err) return nil, connect.NewError(connect.CodeNotFound, err)
} }
return nil, connect.NewError(connect.CodeFailedPrecondition, err) return nil, connect.NewError(connect.CodeFailedPrecondition, err)
@ -215,7 +220,12 @@ func (s *Server) Exec(
execCtx, cancel := context.WithTimeout(ctx, timeout) execCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel() defer cancel()
result, err := s.mgr.Exec(execCtx, msg.SandboxId, msg.Cmd, msg.Args...) var opts *envdclient.ExecOpts
if len(msg.Envs) > 0 || msg.Cwd != "" {
opts = &envdclient.ExecOpts{Envs: msg.Envs, Cwd: msg.Cwd}
}
result, err := s.mgr.Exec(execCtx, msg.SandboxId, msg.Cmd, msg.Args, opts)
if err != nil { if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("exec: %w", err)) return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("exec: %w", err))
} }
@ -301,7 +311,7 @@ func (s *Server) MakeDir(
resp, err := client.MakeDir(ctx, msg.Path) resp, err := client.MakeDir(ctx, msg.Path)
if err != nil { if err != nil {
return nil, fmt.Errorf("make dir: %w", err) return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("make dir: %w", err))
} }
return connect.NewResponse(&pb.MakeDirResponse{ return connect.NewResponse(&pb.MakeDirResponse{
@ -373,6 +383,8 @@ func (s *Server) ExecStream(
Error: ev.Error, Error: ev.Error,
}, },
} }
default:
continue
} }
if err := stream.Send(&resp); err != nil { if err := stream.Send(&resp); err != nil {
return err return err
@ -588,7 +600,7 @@ func (s *Server) GetSandboxMetrics(
points, err := s.mgr.GetMetrics(msg.SandboxId, msg.Range) points, err := s.mgr.GetMetrics(msg.SandboxId, msg.Range)
if err != nil { if err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return nil, connect.NewError(connect.CodeNotFound, err) return nil, connect.NewError(connect.CodeNotFound, err)
} }
if strings.Contains(err.Error(), "invalid range") { if strings.Contains(err.Error(), "invalid range") {
@ -606,7 +618,7 @@ func (s *Server) FlushSandboxMetrics(
) (*connect.Response[pb.FlushSandboxMetricsResponse], error) { ) (*connect.Response[pb.FlushSandboxMetricsResponse], error) {
pts10m, pts2h, pts24h, err := s.mgr.FlushMetrics(req.Msg.SandboxId) pts10m, pts2h, pts24h, err := s.mgr.FlushMetrics(req.Msg.SandboxId)
if err != nil { if err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return nil, connect.NewError(connect.CodeNotFound, err) return nil, connect.NewError(connect.CodeNotFound, err)
} }
return nil, connect.NewError(connect.CodeInternal, err) return nil, connect.NewError(connect.CodeInternal, err)
@ -759,7 +771,7 @@ func (s *Server) StartBackground(
pid, err := s.mgr.StartBackground(ctx, msg.SandboxId, msg.Tag, msg.Cmd, msg.Args, msg.Envs, msg.Cwd) pid, err := s.mgr.StartBackground(ctx, msg.SandboxId, msg.Tag, msg.Cmd, msg.Args, msg.Envs, msg.Cwd)
if err != nil { if err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return nil, connect.NewError(connect.CodeNotFound, err) return nil, connect.NewError(connect.CodeNotFound, err)
} }
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("start background: %w", err)) return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("start background: %w", err))
@ -777,7 +789,7 @@ func (s *Server) ListProcesses(
) (*connect.Response[pb.ListProcessesResponse], error) { ) (*connect.Response[pb.ListProcessesResponse], error) {
procs, err := s.mgr.ListProcesses(ctx, req.Msg.SandboxId) procs, err := s.mgr.ListProcesses(ctx, req.Msg.SandboxId)
if err != nil { if err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return nil, connect.NewError(connect.CodeNotFound, err) return nil, connect.NewError(connect.CodeNotFound, err)
} }
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("list processes: %w", err)) return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("list processes: %w", err))
@ -828,7 +840,7 @@ func (s *Server) KillProcess(
} }
if err := s.mgr.KillProcess(ctx, msg.SandboxId, pid, tag, signal); err != nil { if err := s.mgr.KillProcess(ctx, msg.SandboxId, pid, tag, signal); err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return nil, connect.NewError(connect.CodeNotFound, err) return nil, connect.NewError(connect.CodeNotFound, err)
} }
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("kill process: %w", err)) return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("kill process: %w", err))
@ -857,7 +869,7 @@ func (s *Server) ConnectProcess(
events, err := s.mgr.ConnectProcess(ctx, msg.SandboxId, pid, tag) events, err := s.mgr.ConnectProcess(ctx, msg.SandboxId, pid, tag)
if err != nil { if err != nil {
if strings.Contains(err.Error(), "not found") { if errors.Is(err, sandbox.ErrNotFound) {
return connect.NewError(connect.CodeNotFound, err) return connect.NewError(connect.CodeNotFound, err)
} }
return connect.NewError(connect.CodeInternal, fmt.Errorf("connect process: %w", err)) return connect.NewError(connect.CodeInternal, fmt.Errorf("connect process: %w", err))
@ -889,6 +901,8 @@ func (s *Server) ConnectProcess(
Error: ev.Error, Error: ev.Error,
}, },
} }
default:
continue
} }
if err := stream.Send(&resp); err != nil { if err := stream.Send(&resp); err != nil {
return err return err

View File

@ -46,7 +46,7 @@ func SandboxesDir(wrennDir string) string {
return filepath.Join(wrennDir, "sandboxes") return filepath.Join(wrennDir, "sandboxes")
} }
// KernelPath returns the path to the Firecracker kernel. // KernelPath returns the path to the VM kernel.
func KernelPath(wrennDir string) string { func KernelPath(wrennDir string) string {
return filepath.Join(wrennDir, "kernels", "vmlinux") return filepath.Join(wrennDir, "kernels", "vmlinux")
} }

View File

@ -176,7 +176,7 @@ func NewSlot(index int) *Slot {
// CreateNetwork sets up the full network topology for a sandbox: // CreateNetwork sets up the full network topology for a sandbox:
// - Named network namespace // - Named network namespace
// - Veth pair bridging host and namespace // - Veth pair bridging host and namespace
// - TAP device inside namespace for Firecracker // - TAP device inside namespace for Cloud Hypervisor
// - Routes and NAT rules for connectivity // - Routes and NAT rules for connectivity
// //
// On error, all partially created resources are rolled back. // On error, all partially created resources are rolled back.
@ -430,6 +430,9 @@ func CreateNetwork(slot *Slot) error {
rollback() rollback()
return fmt.Errorf("add masquerade rule: %w", err) return fmt.Errorf("add masquerade rule: %w", err)
} }
rollbacks = append(rollbacks, func() {
_ = iptablesHost("-t", "nat", "-D", "POSTROUTING", "-s", fmt.Sprintf("%s/32", slot.VpeerIP.String()), "-o", defaultIface, "-j", "MASQUERADE")
})
slog.Info("network created", slog.Info("network created",
"ns", slot.NamespaceID, "ns", slot.NamespaceID,

View File

@ -0,0 +1,28 @@
package sandbox
import (
"fmt"
"os/exec"
"strings"
)
// DetectCHVersion runs the cloud-hypervisor binary with --version and
// parses the semver from the output (e.g. "cloud-hypervisor v43.0" → "43.0").
func DetectCHVersion(binaryPath string) (string, error) {
out, err := exec.Command(binaryPath, "--version").Output()
if err != nil {
return "", fmt.Errorf("run %s --version: %w", binaryPath, err)
}
line := strings.TrimSpace(string(out))
for field := range strings.FieldsSeq(line) {
v := strings.TrimPrefix(field, "v")
if v != field || strings.Contains(field, ".") {
if strings.Count(v, ".") >= 1 {
return v, nil
}
}
}
return "", fmt.Errorf("could not parse version from cloud-hypervisor output: %q", line)
}

View File

@ -1,30 +0,0 @@
package sandbox
import (
"fmt"
"os/exec"
"strings"
)
// DetectFirecrackerVersion runs the firecracker binary with --version and
// parses the semver from the output (e.g. "Firecracker v1.14.1" → "1.14.1").
func DetectFirecrackerVersion(binaryPath string) (string, error) {
out, err := exec.Command(binaryPath, "--version").Output()
if err != nil {
return "", fmt.Errorf("run %s --version: %w", binaryPath, err)
}
// Output is typically "Firecracker v1.14.1\n" or similar.
line := strings.TrimSpace(string(out))
for _, field := range strings.Fields(line) {
v := strings.TrimPrefix(field, "v")
if v != field || strings.Contains(field, ".") {
// Either had a "v" prefix or contains a dot — likely the version.
if strings.Count(v, ".") >= 1 {
return v, nil
}
}
}
return "", fmt.Errorf("could not parse version from firecracker output: %q", line)
}

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +1,11 @@
package sandbox package sandbox
import ( import (
"context"
"encoding/json" "encoding/json"
"fmt" "fmt"
"io" "io"
"net/http"
"os" "os"
"strconv" "strconv"
"strings" "strings"
@ -50,10 +52,15 @@ func readCPUStat(pid int) (cpuStat, error) {
// readEnvdMemUsed fetches mem_used from envd's /metrics endpoint. Returns // readEnvdMemUsed fetches mem_used from envd's /metrics endpoint. Returns
// guest-side total - MemAvailable (actual process memory, excluding reclaimable // guest-side total - MemAvailable (actual process memory, excluding reclaimable
// page cache). VmRSS of the Firecracker process includes guest page cache and // page cache). VmRSS of the VMM process includes guest page cache and
// never decreases, so this is the accurate metric for dashboard display. // never decreases, so this is the accurate metric for dashboard display.
func readEnvdMemUsed(client *envdclient.Client) (int64, error) { func readEnvdMemUsed(ctx context.Context, client *envdclient.Client) (int64, error) {
resp, err := client.HTTPClient().Get(client.BaseURL() + "/metrics") req, err := http.NewRequestWithContext(ctx, http.MethodGet, client.BaseURL()+"/metrics", nil)
if err != nil {
return 0, fmt.Errorf("build metrics request: %w", err)
}
resp, err := client.HTTPClient().Do(req)
if err != nil { if err != nil {
return 0, fmt.Errorf("fetch envd metrics: %w", err) return 0, fmt.Errorf("fetch envd metrics: %w", err)
} }

View File

@ -1,221 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// Modifications by M/S Omukk
// Package snapshot implements snapshot storage, header-based memory mapping,
// and memory file processing for Firecracker VM snapshots.
//
// The header system implements a generational copy-on-write memory mapping.
// Each snapshot generation stores only the blocks that changed since the
// previous generation. A Header contains a sorted list of BuildMap entries
// that together cover the entire memory address space, with each entry
// pointing to a specific generation's diff file.
package snapshot
import (
"bytes"
"context"
"encoding/binary"
"errors"
"fmt"
"io"
"github.com/google/uuid"
)
const metadataVersion = 1
// Metadata is the fixed-size header prefix describing the snapshot memory layout.
// Binary layout (little-endian, 64 bytes total):
//
// Version uint64 (8 bytes)
// BlockSize uint64 (8 bytes)
// Size uint64 (8 bytes) — total memory size in bytes
// Generation uint64 (8 bytes)
// BuildID [16]byte (UUID)
// BaseBuildID [16]byte (UUID)
type Metadata struct {
Version uint64
BlockSize uint64
Size uint64
Generation uint64
BuildID uuid.UUID
BaseBuildID uuid.UUID
}
// NewMetadata creates metadata for a first-generation snapshot.
func NewMetadata(buildID uuid.UUID, blockSize, size uint64) *Metadata {
return &Metadata{
Version: metadataVersion,
Generation: 0,
BlockSize: blockSize,
Size: size,
BuildID: buildID,
BaseBuildID: buildID,
}
}
// NextGeneration creates metadata for the next generation in the chain.
func (m *Metadata) NextGeneration(buildID uuid.UUID) *Metadata {
return &Metadata{
Version: m.Version,
Generation: m.Generation + 1,
BlockSize: m.BlockSize,
Size: m.Size,
BuildID: buildID,
BaseBuildID: m.BaseBuildID,
}
}
// BuildMap maps a contiguous range of the memory address space to a specific
// generation's diff file. Binary layout (little-endian, 40 bytes):
//
// Offset uint64 — byte offset in the virtual address space
// Length uint64 — byte count (multiple of BlockSize)
// BuildID [16]byte — which generation's diff file, uuid.Nil = zero-fill
// BuildStorageOffset uint64 — byte offset within that generation's diff file
type BuildMap struct {
Offset uint64
Length uint64
BuildID uuid.UUID
BuildStorageOffset uint64
}
// Header is the in-memory representation of a snapshot's memory mapping.
// It provides O(log N) lookup from any memory offset to the correct
// generation's diff file and offset within it.
type Header struct {
Metadata *Metadata
Mapping []*BuildMap
// blockStarts tracks which block indices start a new BuildMap entry.
// startMap provides direct access from block index to the BuildMap.
blockStarts []bool
startMap map[int64]*BuildMap
}
// NewHeader creates a Header from metadata and mapping entries.
// If mapping is nil/empty, a single entry covering the full size is created.
func NewHeader(metadata *Metadata, mapping []*BuildMap) (*Header, error) {
if metadata.BlockSize == 0 {
return nil, fmt.Errorf("block size cannot be zero")
}
if len(mapping) == 0 {
mapping = []*BuildMap{{
Offset: 0,
Length: metadata.Size,
BuildID: metadata.BuildID,
BuildStorageOffset: 0,
}}
}
blocks := TotalBlocks(int64(metadata.Size), int64(metadata.BlockSize))
starts := make([]bool, blocks)
startMap := make(map[int64]*BuildMap, len(mapping))
for _, m := range mapping {
idx := BlockIdx(int64(m.Offset), int64(metadata.BlockSize))
if idx >= 0 && idx < blocks {
starts[idx] = true
startMap[idx] = m
}
}
return &Header{
Metadata: metadata,
Mapping: mapping,
blockStarts: starts,
startMap: startMap,
}, nil
}
// GetShiftedMapping resolves a memory offset to the corresponding diff file
// offset, remaining length, and build ID. This is the hot path called for
// every UFFD page fault.
func (h *Header) GetShiftedMapping(_ context.Context, offset int64) (mappedOffset int64, mappedLength int64, buildID *uuid.UUID, err error) {
if offset < 0 || offset >= int64(h.Metadata.Size) {
return 0, 0, nil, fmt.Errorf("offset %d out of bounds (size: %d)", offset, h.Metadata.Size)
}
blockSize := int64(h.Metadata.BlockSize)
block := BlockIdx(offset, blockSize)
// Walk backwards to find the BuildMap that contains this block.
start := block
for start >= 0 {
if h.blockStarts[start] {
break
}
start--
}
if start < 0 {
return 0, 0, nil, fmt.Errorf("no mapping found for offset %d", offset)
}
m, ok := h.startMap[start]
if !ok {
return 0, 0, nil, fmt.Errorf("no mapping at block %d", start)
}
shift := (block - start) * blockSize
if shift >= int64(m.Length) {
return 0, 0, nil, fmt.Errorf("offset %d beyond mapping end (mapping offset=%d, length=%d)", offset, m.Offset, m.Length)
}
return int64(m.BuildStorageOffset) + shift, int64(m.Length) - shift, &m.BuildID, nil
}
// Serialize writes metadata + mapping entries to binary (little-endian).
func Serialize(metadata *Metadata, mappings []*BuildMap) ([]byte, error) {
var buf bytes.Buffer
if err := binary.Write(&buf, binary.LittleEndian, metadata); err != nil {
return nil, fmt.Errorf("write metadata: %w", err)
}
for _, m := range mappings {
if err := binary.Write(&buf, binary.LittleEndian, m); err != nil {
return nil, fmt.Errorf("write mapping: %w", err)
}
}
return buf.Bytes(), nil
}
// Deserialize reads a header from binary data.
func Deserialize(data []byte) (*Header, error) {
reader := bytes.NewReader(data)
var metadata Metadata
if err := binary.Read(reader, binary.LittleEndian, &metadata); err != nil {
return nil, fmt.Errorf("read metadata: %w", err)
}
var mappings []*BuildMap
for {
var m BuildMap
if err := binary.Read(reader, binary.LittleEndian, &m); err != nil {
if errors.Is(err, io.EOF) {
break
}
return nil, fmt.Errorf("read mapping: %w", err)
}
mappings = append(mappings, &m)
}
return NewHeader(&metadata, mappings)
}
// Block index helpers.
func TotalBlocks(size, blockSize int64) int64 {
return (size + blockSize - 1) / blockSize
}
func BlockIdx(offset, blockSize int64) int64 {
return offset / blockSize
}
func BlockOffset(idx, blockSize int64) int64 {
return idx * blockSize
}

View File

@ -7,14 +7,15 @@ import (
"os" "os"
"path/filepath" "path/filepath"
"syscall" "syscall"
"github.com/google/uuid"
) )
const ( const (
SnapFileName = "snapfile" // Cloud Hypervisor snapshot files.
MemDiffName = "memfile" CHConfigFile = "config.json"
MemHeaderName = "memfile.header" CHMemRangesFile = "memory-ranges"
CHStateFile = "state.json"
// Rootfs files.
RootfsFileName = "rootfs.ext4" RootfsFileName = "rootfs.ext4"
RootfsCowName = "rootfs.cow" RootfsCowName = "rootfs.cow"
RootfsMetaName = "rootfs.meta" RootfsMetaName = "rootfs.meta"
@ -25,27 +26,6 @@ func DirPath(baseDir, name string) string {
return filepath.Join(baseDir, name) return filepath.Join(baseDir, name)
} }
// SnapPath returns the path to the VM state snapshot file.
func SnapPath(baseDir, name string) string {
return filepath.Join(DirPath(baseDir, name), SnapFileName)
}
// MemDiffPath returns the path to the compact memory diff file (legacy single-generation).
func MemDiffPath(baseDir, name string) string {
return filepath.Join(DirPath(baseDir, name), MemDiffName)
}
// MemDiffPathForBuild returns the path to a specific generation's diff file.
// Format: memfile.{buildID}
func MemDiffPathForBuild(baseDir, name string, buildID uuid.UUID) string {
return filepath.Join(DirPath(baseDir, name), fmt.Sprintf("memfile.%s", buildID.String()))
}
// MemHeaderPath returns the path to the memory mapping header file.
func MemHeaderPath(baseDir, name string) string {
return filepath.Join(DirPath(baseDir, name), MemHeaderName)
}
// RootfsPath returns the path to the rootfs image. // RootfsPath returns the path to the rootfs image.
func RootfsPath(baseDir, name string) string { func RootfsPath(baseDir, name string) string {
return filepath.Join(DirPath(baseDir, name), RootfsFileName) return filepath.Join(DirPath(baseDir, name), RootfsFileName)
@ -61,10 +41,13 @@ func MetaPath(baseDir, name string) string {
return filepath.Join(DirPath(baseDir, name), RootfsMetaName) return filepath.Join(DirPath(baseDir, name), RootfsMetaName)
} }
// RootfsMeta records which base template a CoW file was created against. // RootfsMeta records which base template a CoW file was created against
// and the VM resource config needed to restart the sampler on resume.
type RootfsMeta struct { type RootfsMeta struct {
BaseTemplate string `json:"base_template"` BaseTemplate string `json:"base_template"`
TemplateID string `json:"template_id,omitempty"` TemplateID string `json:"template_id,omitempty"`
VCPUs int `json:"vcpus,omitempty"`
MemoryMB int `json:"memory_mb,omitempty"`
} }
// WriteMeta writes rootfs metadata to the snapshot directory. // WriteMeta writes rootfs metadata to the snapshot directory.
@ -92,102 +75,6 @@ func ReadMeta(baseDir, name string) (*RootfsMeta, error) {
return &meta, nil return &meta, nil
} }
// Exists reports whether a complete snapshot exists (all required files present).
// Supports both legacy (rootfs.ext4) and CoW-based (rootfs.cow + rootfs.meta) snapshots.
// Memory diff files can be either legacy "memfile" or generation-specific "memfile.{uuid}".
func Exists(baseDir, name string) bool {
dir := DirPath(baseDir, name)
// snapfile and header are always required.
for _, f := range []string{SnapFileName, MemHeaderName} {
if _, err := os.Stat(filepath.Join(dir, f)); err != nil {
return false
}
}
// Check that at least one memfile exists (legacy or generation-specific).
// We verify by reading the header and checking that referenced diff files exist.
// Fall back to checking for the legacy memfile name if header can't be read.
if _, err := os.Stat(filepath.Join(dir, MemDiffName)); err != nil {
// No legacy memfile — check if any memfile.{uuid} exists by
// looking for files matching the pattern.
matches, _ := filepath.Glob(filepath.Join(dir, "memfile.*"))
hasGenDiff := false
for _, m := range matches {
base := filepath.Base(m)
if base != MemHeaderName {
hasGenDiff = true
break
}
}
if !hasGenDiff {
return false
}
}
// Accept either rootfs.ext4 (legacy/template) or rootfs.cow + rootfs.meta (dm-snapshot).
if _, err := os.Stat(filepath.Join(dir, RootfsFileName)); err == nil {
return true
}
if _, err := os.Stat(filepath.Join(dir, RootfsCowName)); err == nil {
if _, err := os.Stat(filepath.Join(dir, RootfsMetaName)); err == nil {
return true
}
}
return false
}
// IsTemplate reports whether a template image directory exists (has rootfs.ext4).
func IsTemplate(baseDir, name string) bool {
_, err := os.Stat(filepath.Join(DirPath(baseDir, name), RootfsFileName))
return err == nil
}
// IsSnapshot reports whether a directory is a snapshot (has all snapshot files).
func IsSnapshot(baseDir, name string) bool {
return Exists(baseDir, name)
}
// HasCow reports whether a snapshot uses CoW format (rootfs.cow + rootfs.meta)
// as opposed to legacy full rootfs (rootfs.ext4).
func HasCow(baseDir, name string) bool {
dir := DirPath(baseDir, name)
_, cowErr := os.Stat(filepath.Join(dir, RootfsCowName))
_, metaErr := os.Stat(filepath.Join(dir, RootfsMetaName))
return cowErr == nil && metaErr == nil
}
// ListDiffFiles returns a map of build ID → file path for all memory diff files
// referenced by the given header. Handles both the legacy "memfile" name
// (single-generation) and generation-specific "memfile.{uuid}" names.
func ListDiffFiles(baseDir, name string, header *Header) (map[string]string, error) {
dir := DirPath(baseDir, name)
result := make(map[string]string)
for _, m := range header.Mapping {
if m.BuildID == uuid.Nil {
continue // zero-fill, no file needed
}
idStr := m.BuildID.String()
if _, exists := result[idStr]; exists {
continue
}
// Try generation-specific path first, fall back to legacy.
genPath := filepath.Join(dir, fmt.Sprintf("memfile.%s", idStr))
if _, err := os.Stat(genPath); err == nil {
result[idStr] = genPath
continue
}
legacyPath := filepath.Join(dir, MemDiffName)
if _, err := os.Stat(legacyPath); err == nil {
result[idStr] = legacyPath
continue
}
return nil, fmt.Errorf("diff file not found for build %s", idStr)
}
return result, nil
}
// EnsureDir creates the snapshot directory if it doesn't exist. // EnsureDir creates the snapshot directory if it doesn't exist.
func EnsureDir(baseDir, name string) error { func EnsureDir(baseDir, name string) error {
dir := DirPath(baseDir, name) dir := DirPath(baseDir, name)

View File

@ -1,214 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// Modifications by M/S Omukk
package snapshot
import "github.com/google/uuid"
// CreateMapping converts a dirty-block bitset (represented as a []bool) into
// a sorted list of BuildMap entries. Consecutive dirty blocks are merged into
// a single entry. BuildStorageOffset tracks the sequential position in the
// compact diff file.
func CreateMapping(buildID uuid.UUID, dirty []bool, blockSize int64) []*BuildMap {
var mappings []*BuildMap
var runStart int64 = -1
var runLength int64
var storageOffset uint64
for i, set := range dirty {
if !set {
if runLength > 0 {
mappings = append(mappings, &BuildMap{
Offset: uint64(runStart) * uint64(blockSize),
Length: uint64(runLength) * uint64(blockSize),
BuildID: buildID,
BuildStorageOffset: storageOffset,
})
storageOffset += uint64(runLength) * uint64(blockSize)
runLength = 0
}
runStart = -1
continue
}
if runStart < 0 {
runStart = int64(i)
runLength = 1
} else {
runLength++
}
}
if runLength > 0 {
mappings = append(mappings, &BuildMap{
Offset: uint64(runStart) * uint64(blockSize),
Length: uint64(runLength) * uint64(blockSize),
BuildID: buildID,
BuildStorageOffset: storageOffset,
})
}
return mappings
}
// MergeMappings overlays diffMapping on top of baseMapping. Where they overlap,
// diff takes priority. The result covers the entire address space.
//
// Both inputs must be sorted by Offset. The base mapping should cover the full size.
//
// Inspired by e2b's snapshot system (Apache 2.0, modified by Omukk).
func MergeMappings(baseMapping, diffMapping []*BuildMap) []*BuildMap {
if len(diffMapping) == 0 {
return baseMapping
}
// Work on a copy of baseMapping to avoid mutating the original.
baseCopy := make([]*BuildMap, len(baseMapping))
for i, m := range baseMapping {
cp := *m
baseCopy[i] = &cp
}
var result []*BuildMap
var bi, di int
for bi < len(baseCopy) && di < len(diffMapping) {
base := baseCopy[bi]
diff := diffMapping[di]
if base.Length == 0 {
bi++
continue
}
if diff.Length == 0 {
di++
continue
}
// No overlap: base entirely before diff.
if base.Offset+base.Length <= diff.Offset {
result = append(result, base)
bi++
continue
}
// No overlap: diff entirely before base.
if diff.Offset+diff.Length <= base.Offset {
result = append(result, diff)
di++
continue
}
// Base fully inside diff — skip base.
if base.Offset >= diff.Offset && base.Offset+base.Length <= diff.Offset+diff.Length {
bi++
continue
}
// Diff fully inside base — split base around diff.
if diff.Offset >= base.Offset && diff.Offset+diff.Length <= base.Offset+base.Length {
leftLen := int64(diff.Offset) - int64(base.Offset)
if leftLen > 0 {
result = append(result, &BuildMap{
Offset: base.Offset,
Length: uint64(leftLen),
BuildID: base.BuildID,
BuildStorageOffset: base.BuildStorageOffset,
})
}
result = append(result, diff)
di++
rightShift := int64(diff.Offset) + int64(diff.Length) - int64(base.Offset)
rightLen := int64(base.Length) - rightShift
if rightLen > 0 {
baseCopy[bi] = &BuildMap{
Offset: base.Offset + uint64(rightShift),
Length: uint64(rightLen),
BuildID: base.BuildID,
BuildStorageOffset: base.BuildStorageOffset + uint64(rightShift),
}
} else {
bi++
}
continue
}
// Base starts after diff with overlap — emit diff, trim base.
if base.Offset > diff.Offset {
result = append(result, diff)
di++
rightShift := int64(diff.Offset) + int64(diff.Length) - int64(base.Offset)
rightLen := int64(base.Length) - rightShift
if rightLen > 0 {
baseCopy[bi] = &BuildMap{
Offset: base.Offset + uint64(rightShift),
Length: uint64(rightLen),
BuildID: base.BuildID,
BuildStorageOffset: base.BuildStorageOffset + uint64(rightShift),
}
} else {
bi++
}
continue
}
// Diff starts after base with overlap — emit left part of base.
if diff.Offset > base.Offset {
leftLen := int64(diff.Offset) - int64(base.Offset)
if leftLen > 0 {
result = append(result, &BuildMap{
Offset: base.Offset,
Length: uint64(leftLen),
BuildID: base.BuildID,
BuildStorageOffset: base.BuildStorageOffset,
})
}
bi++
continue
}
}
// Append remaining entries.
result = append(result, baseCopy[bi:]...)
result = append(result, diffMapping[di:]...)
return result
}
// NormalizeMappings merges adjacent entries with the same BuildID.
func NormalizeMappings(mappings []*BuildMap) []*BuildMap {
if len(mappings) == 0 {
return nil
}
result := make([]*BuildMap, 0, len(mappings))
current := &BuildMap{
Offset: mappings[0].Offset,
Length: mappings[0].Length,
BuildID: mappings[0].BuildID,
BuildStorageOffset: mappings[0].BuildStorageOffset,
}
for i := 1; i < len(mappings); i++ {
m := mappings[i]
if m.BuildID == current.BuildID {
current.Length += m.Length
} else {
result = append(result, current)
current = &BuildMap{
Offset: m.Offset,
Length: m.Length,
BuildID: m.BuildID,
BuildStorageOffset: m.BuildStorageOffset,
}
}
}
result = append(result, current)
return result
}

View File

@ -1,285 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// Modifications by M/S Omukk
package snapshot
import (
"context"
"fmt"
"io"
"os"
"github.com/google/uuid"
)
const (
// DefaultBlockSize is 4KB — standard page size for Firecracker.
DefaultBlockSize int64 = 4096
)
// ProcessMemfile reads a full memory file produced by Firecracker's
// PUT /snapshot/create, identifies non-zero blocks, and writes only those
// blocks to a compact diff file. Returns the Header describing the mapping.
//
// The output diff file contains non-zero blocks written sequentially.
// The header maps each block in the full address space to either:
// - A position in the diff file (for non-zero blocks)
// - uuid.Nil (for zero/empty blocks, served as zeros without I/O)
//
// buildID identifies this snapshot generation in the header chain.
func ProcessMemfile(memfilePath, diffPath, headerPath string, buildID uuid.UUID) (*Header, error) {
src, err := os.Open(memfilePath)
if err != nil {
return nil, fmt.Errorf("open memfile: %w", err)
}
defer src.Close()
info, err := src.Stat()
if err != nil {
return nil, fmt.Errorf("stat memfile: %w", err)
}
memSize := info.Size()
dst, err := os.Create(diffPath)
if err != nil {
return nil, fmt.Errorf("create diff file: %w", err)
}
defer dst.Close()
totalBlocks := TotalBlocks(memSize, DefaultBlockSize)
dirty := make([]bool, totalBlocks)
empty := make([]bool, totalBlocks)
buf := make([]byte, DefaultBlockSize)
for i := int64(0); i < totalBlocks; i++ {
n, err := io.ReadFull(src, buf)
if err != nil && err != io.ErrUnexpectedEOF {
return nil, fmt.Errorf("read block %d: %w", i, err)
}
// Zero-pad the last block if it's short.
if int64(n) < DefaultBlockSize {
for j := n; j < int(DefaultBlockSize); j++ {
buf[j] = 0
}
}
if isZeroBlock(buf) {
empty[i] = true
continue
}
dirty[i] = true
if _, err := dst.Write(buf); err != nil {
return nil, fmt.Errorf("write diff block %d: %w", i, err)
}
}
// Build header.
dirtyMappings := CreateMapping(buildID, dirty, DefaultBlockSize)
emptyMappings := CreateMapping(uuid.Nil, empty, DefaultBlockSize)
merged := MergeMappings(dirtyMappings, emptyMappings)
normalized := NormalizeMappings(merged)
metadata := NewMetadata(buildID, uint64(DefaultBlockSize), uint64(memSize))
header, err := NewHeader(metadata, normalized)
if err != nil {
return nil, fmt.Errorf("create header: %w", err)
}
// Write header to disk.
headerData, err := Serialize(metadata, normalized)
if err != nil {
return nil, fmt.Errorf("serialize header: %w", err)
}
if err := os.WriteFile(headerPath, headerData, 0644); err != nil {
return nil, fmt.Errorf("write header: %w", err)
}
return header, nil
}
// ProcessMemfileWithParent processes a memory file as a new generation on top
// of an existing parent header. The new diff file contains only blocks that
// differ from what the parent header maps. This is used for re-pause of a
// sandbox that was restored from a snapshot.
func ProcessMemfileWithParent(memfilePath, diffPath, headerPath string, parentHeader *Header, buildID uuid.UUID) (*Header, error) {
src, err := os.Open(memfilePath)
if err != nil {
return nil, fmt.Errorf("open memfile: %w", err)
}
defer src.Close()
info, err := src.Stat()
if err != nil {
return nil, fmt.Errorf("stat memfile: %w", err)
}
memSize := info.Size()
dst, err := os.Create(diffPath)
if err != nil {
return nil, fmt.Errorf("create diff file: %w", err)
}
defer dst.Close()
totalBlocks := TotalBlocks(memSize, DefaultBlockSize)
dirty := make([]bool, totalBlocks)
buf := make([]byte, DefaultBlockSize)
for i := int64(0); i < totalBlocks; i++ {
n, err := io.ReadFull(src, buf)
if err != nil && err != io.ErrUnexpectedEOF {
return nil, fmt.Errorf("read block %d: %w", i, err)
}
if int64(n) < DefaultBlockSize {
for j := n; j < int(DefaultBlockSize); j++ {
buf[j] = 0
}
}
if isZeroBlock(buf) {
// For a diff memfile, zero blocks mean "not dirtied since resume" —
// they should inherit the parent's mapping, not be zero-filled.
continue
}
dirty[i] = true
if _, err := dst.Write(buf); err != nil {
return nil, fmt.Errorf("write diff block %d: %w", i, err)
}
}
// Only dirty blocks go into the diff overlay; MergeMappings preserves the
// parent's mapping for everything else.
dirtyMappings := CreateMapping(buildID, dirty, DefaultBlockSize)
merged := MergeMappings(parentHeader.Mapping, dirtyMappings)
normalized := NormalizeMappings(merged)
metadata := parentHeader.Metadata.NextGeneration(buildID)
header, err := NewHeader(metadata, normalized)
if err != nil {
return nil, fmt.Errorf("create header: %w", err)
}
headerData, err := Serialize(metadata, normalized)
if err != nil {
return nil, fmt.Errorf("serialize header: %w", err)
}
if err := os.WriteFile(headerPath, headerData, 0644); err != nil {
return nil, fmt.Errorf("write header: %w", err)
}
return header, nil
}
// MergeDiffs consolidates multiple generation diff files into a single diff
// file and resets the generation counter to 0. This is a pure file-level
// operation — no Firecracker involvement.
//
// It reads each non-nil block from the appropriate diff file (as mapped by
// the header), writes them all sequentially into a single new diff file,
// and produces a fresh header pointing only at that file.
//
// diffFiles maps build ID (string) → open file path for each generation's diff.
func MergeDiffs(header *Header, diffFiles map[string]string, mergedDiffPath, headerPath string) (*Header, error) {
blockSize := int64(header.Metadata.BlockSize)
mergedBuildID := uuid.New()
// Open all source diff files.
sources := make(map[string]*os.File, len(diffFiles))
for id, path := range diffFiles {
f, err := os.Open(path)
if err != nil {
// Close already opened files.
for _, sf := range sources {
sf.Close()
}
return nil, fmt.Errorf("open diff file for build %s: %w", id, err)
}
sources[id] = f
}
defer func() {
for _, f := range sources {
f.Close()
}
}()
dst, err := os.Create(mergedDiffPath)
if err != nil {
return nil, fmt.Errorf("create merged diff file: %w", err)
}
defer dst.Close()
totalBlocks := TotalBlocks(int64(header.Metadata.Size), blockSize)
dirty := make([]bool, totalBlocks)
empty := make([]bool, totalBlocks)
buf := make([]byte, blockSize)
for i := int64(0); i < totalBlocks; i++ {
offset := i * blockSize
mappedOffset, _, buildID, err := header.GetShiftedMapping(context.Background(), offset)
if err != nil {
return nil, fmt.Errorf("lookup block %d: %w", i, err)
}
if *buildID == uuid.Nil {
empty[i] = true
continue
}
src, ok := sources[buildID.String()]
if !ok {
return nil, fmt.Errorf("no diff file for build %s (block %d)", buildID, i)
}
if _, err := src.ReadAt(buf, mappedOffset); err != nil {
return nil, fmt.Errorf("read block %d from build %s: %w", i, buildID, err)
}
dirty[i] = true
if _, err := dst.Write(buf); err != nil {
return nil, fmt.Errorf("write merged block %d: %w", i, err)
}
}
// Build fresh header with generation 0.
dirtyMappings := CreateMapping(mergedBuildID, dirty, blockSize)
emptyMappings := CreateMapping(uuid.Nil, empty, blockSize)
merged := MergeMappings(dirtyMappings, emptyMappings)
normalized := NormalizeMappings(merged)
metadata := NewMetadata(mergedBuildID, uint64(blockSize), header.Metadata.Size)
newHeader, err := NewHeader(metadata, normalized)
if err != nil {
return nil, fmt.Errorf("create merged header: %w", err)
}
headerData, err := Serialize(metadata, normalized)
if err != nil {
return nil, fmt.Errorf("serialize merged header: %w", err)
}
if err := os.WriteFile(headerPath, headerData, 0644); err != nil {
return nil, fmt.Errorf("write merged header: %w", err)
}
return newHeader, nil
}
// isZeroBlock checks if a block is entirely zero bytes.
func isZeroBlock(block []byte) bool {
// Fast path: compare 8 bytes at a time.
for i := 0; i+8 <= len(block); i += 8 {
if block[i] != 0 || block[i+1] != 0 || block[i+2] != 0 || block[i+3] != 0 ||
block[i+4] != 0 || block[i+5] != 0 || block[i+6] != 0 || block[i+7] != 0 {
return false
}
}
// Tail bytes.
for i := len(block) &^ 7; i < len(block); i++ {
if block[i] != 0 {
return false
}
}
return true
}

View File

@ -1,92 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// Modifications by M/S Omukk
// Package uffd implements a userfaultfd-based memory server for Firecracker
// snapshot restore. When a VM is restored from a snapshot, instead of loading
// the entire memory file upfront, the UFFD handler intercepts page faults
// and serves memory pages on demand from the snapshot's compact diff file.
package uffd
/*
#include <sys/syscall.h>
#include <fcntl.h>
#include <linux/userfaultfd.h>
#include <sys/ioctl.h>
struct uffd_pagefault {
__u64 flags;
__u64 address;
__u32 ptid;
};
*/
import "C"
import (
"fmt"
"syscall"
"unsafe"
)
const (
UFFD_EVENT_PAGEFAULT = C.UFFD_EVENT_PAGEFAULT
UFFD_EVENT_FORK = C.UFFD_EVENT_FORK
UFFD_EVENT_REMAP = C.UFFD_EVENT_REMAP
UFFD_EVENT_REMOVE = C.UFFD_EVENT_REMOVE
UFFD_EVENT_UNMAP = C.UFFD_EVENT_UNMAP
UFFD_PAGEFAULT_FLAG_WRITE = C.UFFD_PAGEFAULT_FLAG_WRITE
UFFDIO_COPY = C.UFFDIO_COPY
UFFDIO_COPY_MODE_WP = C.UFFDIO_COPY_MODE_WP
)
type (
uffdMsg = C.struct_uffd_msg
uffdPagefault = C.struct_uffd_pagefault
uffdioCopy = C.struct_uffdio_copy
)
// fd wraps a userfaultfd file descriptor received from Firecracker.
type fd uintptr
// copy installs a page into guest memory at the given address using UFFDIO_COPY.
// mode controls write-protection: use UFFDIO_COPY_MODE_WP to preserve WP bit.
func (f fd) copy(addr, pagesize uintptr, data []byte, mode C.ulonglong) error {
alignedAddr := addr &^ (pagesize - 1)
cpy := uffdioCopy{
src: C.ulonglong(uintptr(unsafe.Pointer(&data[0]))),
dst: C.ulonglong(alignedAddr),
len: C.ulonglong(pagesize),
mode: mode,
copy: 0,
}
_, _, errno := syscall.Syscall(syscall.SYS_IOCTL, uintptr(f), UFFDIO_COPY, uintptr(unsafe.Pointer(&cpy)))
if errno != 0 {
return errno
}
if cpy.copy != C.longlong(pagesize) {
return fmt.Errorf("UFFDIO_COPY copied %d bytes, expected %d", cpy.copy, pagesize)
}
return nil
}
// close closes the userfaultfd file descriptor.
func (f fd) close() error {
return syscall.Close(int(f))
}
// getMsgEvent extracts the event type from a uffd_msg.
func getMsgEvent(msg *uffdMsg) C.uchar {
return msg.event
}
// getMsgArg extracts the arg union from a uffd_msg.
func getMsgArg(msg *uffdMsg) [24]byte {
return msg.arg
}
// getPagefaultAddress extracts the faulting address from a uffd_pagefault.
func getPagefaultAddress(pf *uffdPagefault) uintptr {
return uintptr(pf.address)
}

View File

@ -1,41 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// Modifications by M/S Omukk
//
// Modifications by Omukk (Wrenn Sandbox): merged Region and Mapping into
// single file, inlined shiftedOffset helper.
package uffd
import "fmt"
// Region is a mapping of guest memory to host virtual address space.
// Firecracker sends these as JSON when connecting to the UFFD socket.
// The JSON field names match Firecracker's UFFD protocol.
type Region struct {
BaseHostVirtAddr uintptr `json:"base_host_virt_addr"`
Size uintptr `json:"size"`
Offset uintptr `json:"offset"`
PageSize uintptr `json:"page_size_kib"` // Actually in bytes despite the name.
}
// Mapping translates between host virtual addresses and logical memory offsets.
type Mapping struct {
Regions []Region
}
// NewMapping creates a Mapping from a list of regions.
func NewMapping(regions []Region) *Mapping {
return &Mapping{Regions: regions}
}
// GetOffset converts a host virtual address to a logical memory file offset
// and returns the page size. This is called on every UFFD page fault.
func (m *Mapping) GetOffset(hostVirtAddr uintptr) (int64, uintptr, error) {
for _, r := range m.Regions {
if hostVirtAddr >= r.BaseHostVirtAddr && hostVirtAddr < r.BaseHostVirtAddr+r.Size {
offset := int64(hostVirtAddr-r.BaseHostVirtAddr) + int64(r.Offset)
return offset, r.PageSize, nil
}
}
return 0, 0, fmt.Errorf("address %#x not found in any memory region", hostVirtAddr)
}

View File

@ -1,451 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// Modifications by M/S Omukk
//
// Modifications by Omukk (Wrenn Sandbox): replaced errgroup with WaitGroup
// + semaphore, replaced fdexit abstraction with pipe, integrated with
// snapshot.Header-based DiffFileSource instead of block.ReadonlyDevice,
// fixed EAGAIN handling in poll loop.
package uffd
/*
#include <linux/userfaultfd.h>
*/
import "C"
import (
"context"
"encoding/json"
"errors"
"fmt"
"log/slog"
"net"
"os"
"sync"
"syscall"
"unsafe"
"golang.org/x/sys/unix"
"git.omukk.dev/wrenn/wrenn/internal/snapshot"
)
const (
fdSize = 4
regionMappingsSize = 1024
maxConcurrentFaults = 4096
)
// MemorySource provides page data for the UFFD handler.
// Given a logical memory offset and a size, it returns the page data.
type MemorySource interface {
ReadPage(ctx context.Context, offset int64, size int64) ([]byte, error)
}
// Server manages the UFFD Unix socket lifecycle and page fault handling
// for a single Firecracker snapshot restore.
type Server struct {
socketPath string
source MemorySource
lis *net.UnixListener
readyCh chan struct{}
readyOnce sync.Once
doneCh chan struct{}
doneErr error
// exitPipe signals the poll loop to stop.
exitR *os.File
exitW *os.File
// Set by handle() after Firecracker connects; read by Prefetch()
// after waiting on readyCh (which establishes happens-before).
uffdFd fd
mapping *Mapping
// Prefetch lifecycle: cancel stops the goroutine, prefetchDone is
// closed when it exits. Stop() drains prefetchDone before returning
// so the caller can safely close diff file handles.
prefetchCancel context.CancelFunc
prefetchDone chan struct{}
}
// NewServer creates a UFFD server that will listen on the given socket path
// and serve memory pages from the given source.
func NewServer(socketPath string, source MemorySource) *Server {
return &Server{
socketPath: socketPath,
source: source,
readyCh: make(chan struct{}),
doneCh: make(chan struct{}),
}
}
// Start begins listening on the Unix socket. Firecracker will connect to this
// socket after loadSnapshot is called with the UFFD backend.
// Start returns immediately; the server runs in a background goroutine.
func (s *Server) Start(ctx context.Context) error {
lis, err := net.ListenUnix("unix", &net.UnixAddr{Name: s.socketPath, Net: "unix"})
if err != nil {
return fmt.Errorf("listen on uffd socket: %w", err)
}
s.lis = lis
if err := os.Chmod(s.socketPath, 0o777); err != nil {
lis.Close()
return fmt.Errorf("chmod uffd socket: %w", err)
}
// Create exit signal pipe.
r, w, err := os.Pipe()
if err != nil {
lis.Close()
return fmt.Errorf("create exit pipe: %w", err)
}
s.exitR = r
s.exitW = w
go func() {
defer close(s.doneCh)
s.doneErr = s.handle(ctx)
s.lis.Close()
s.exitR.Close()
s.exitW.Close()
s.readyOnce.Do(func() { close(s.readyCh) })
}()
return nil
}
// Ready returns a channel that is closed when the UFFD handler is ready
// (after Firecracker has connected and sent the uffd fd).
func (s *Server) Ready() <-chan struct{} {
return s.readyCh
}
// Stop signals the UFFD poll loop to exit and waits for it to finish.
// Also cancels and waits for any running prefetch goroutine.
func (s *Server) Stop() error {
if s.prefetchCancel != nil {
s.prefetchCancel()
}
// Write a byte to the exit pipe to wake the poll loop.
_, _ = s.exitW.Write([]byte{0})
<-s.doneCh
if s.prefetchDone != nil {
<-s.prefetchDone
}
return s.doneErr
}
// Wait blocks until the server exits.
func (s *Server) Wait() error {
<-s.doneCh
return s.doneErr
}
// handle accepts the Firecracker connection, receives the UFFD fd via
// SCM_RIGHTS, and runs the page fault poll loop.
func (s *Server) handle(ctx context.Context) error {
conn, err := s.lis.Accept()
if err != nil {
return fmt.Errorf("accept uffd connection: %w", err)
}
unixConn := conn.(*net.UnixConn)
defer unixConn.Close()
// Read the memory region mappings (JSON) and the UFFD fd (SCM_RIGHTS).
regionBuf := make([]byte, regionMappingsSize)
uffdBuf := make([]byte, syscall.CmsgSpace(fdSize))
nRegion, nFd, _, _, err := unixConn.ReadMsgUnix(regionBuf, uffdBuf)
if err != nil {
return fmt.Errorf("read uffd message: %w", err)
}
var regions []Region
if err := json.Unmarshal(regionBuf[:nRegion], &regions); err != nil {
return fmt.Errorf("parse memory regions: %w", err)
}
controlMsgs, err := syscall.ParseSocketControlMessage(uffdBuf[:nFd])
if err != nil {
return fmt.Errorf("parse control messages: %w", err)
}
if len(controlMsgs) != 1 {
return fmt.Errorf("expected 1 control message, got %d", len(controlMsgs))
}
fds, err := syscall.ParseUnixRights(&controlMsgs[0])
if err != nil {
return fmt.Errorf("parse unix rights: %w", err)
}
if len(fds) != 1 {
return fmt.Errorf("expected 1 fd, got %d", len(fds))
}
uffdFd := fd(fds[0])
defer uffdFd.close()
mapping := NewMapping(regions)
// Store for use by Prefetch().
s.uffdFd = uffdFd
s.mapping = mapping
slog.Info("uffd handler connected",
"regions", len(regions),
"fd", int(uffdFd),
)
// Signal readiness.
s.readyOnce.Do(func() { close(s.readyCh) })
// Run the poll loop.
return s.serve(ctx, uffdFd, mapping)
}
// serve is the main poll loop. It polls the UFFD fd for page fault events
// and the exit pipe for shutdown signals.
func (s *Server) serve(ctx context.Context, uffdFd fd, mapping *Mapping) error {
pollFds := []unix.PollFd{
{Fd: int32(uffdFd), Events: unix.POLLIN},
{Fd: int32(s.exitR.Fd()), Events: unix.POLLIN},
}
var wg sync.WaitGroup
sem := make(chan struct{}, maxConcurrentFaults)
// Always wait for in-flight goroutines before returning, so the caller
// can safely close the uffd fd after serve returns.
defer wg.Wait()
for {
if _, err := unix.Poll(pollFds, -1); err != nil {
if err == unix.EINTR || err == unix.EAGAIN {
continue
}
return fmt.Errorf("poll: %w", err)
}
// Check exit signal.
if pollFds[1].Revents&unix.POLLIN != 0 {
return nil
}
if pollFds[0].Revents&unix.POLLIN == 0 {
continue
}
// Read the uffd_msg. The fd is O_NONBLOCK (set by Firecracker),
// so EAGAIN is expected — just go back to poll.
buf := make([]byte, unsafe.Sizeof(uffdMsg{}))
n, err := readUffdMsg(uffdFd, buf)
if err == syscall.EAGAIN {
continue
}
if err != nil {
return fmt.Errorf("read uffd msg: %w", err)
}
if n == 0 {
continue
}
msg := *(*uffdMsg)(unsafe.Pointer(&buf[0]))
event := getMsgEvent(&msg)
switch event {
case UFFD_EVENT_PAGEFAULT:
// Handled below.
case UFFD_EVENT_REMOVE, UFFD_EVENT_UNMAP, UFFD_EVENT_REMAP, UFFD_EVENT_FORK:
// Non-fatal lifecycle events from the guest kernel (e.g. balloon
// deflation, mmap/munmap). No action needed — continue polling.
continue
default:
return fmt.Errorf("unexpected uffd event type: %d", event)
}
arg := getMsgArg(&msg)
pf := *(*uffdPagefault)(unsafe.Pointer(&arg[0]))
addr := getPagefaultAddress(&pf)
offset, pagesize, err := mapping.GetOffset(addr)
if err != nil {
return fmt.Errorf("resolve address %#x: %w", addr, err)
}
sem <- struct{}{}
wg.Add(1)
go func() {
defer wg.Done()
defer func() { <-sem }()
if err := s.faultPage(ctx, uffdFd, addr, offset, pagesize); err != nil {
slog.Error("uffd fault page error",
"addr", fmt.Sprintf("%#x", addr),
"offset", offset,
"error", err,
)
}
}()
}
}
// readUffdMsg reads a single uffd_msg, retrying on EINTR.
// Returns (n, EAGAIN) if the non-blocking read has nothing available.
func readUffdMsg(uffdFd fd, buf []byte) (int, error) {
for {
n, err := syscall.Read(int(uffdFd), buf)
if err == syscall.EINTR {
continue
}
return n, err
}
}
// faultPage fetches a page from the memory source and copies it into
// guest memory via UFFDIO_COPY.
func (s *Server) faultPage(ctx context.Context, uffdFd fd, addr uintptr, offset int64, pagesize uintptr) error {
data, err := s.source.ReadPage(ctx, offset, int64(pagesize))
if err != nil {
return fmt.Errorf("read page at offset %d: %w", offset, err)
}
// Mode 0: no write-protect. Standard Firecracker does not register
// UFFD ranges with WP support, so UFFDIO_COPY_MODE_WP would fail.
if err := uffdFd.copy(addr, pagesize, data, 0); err != nil {
if errors.Is(err, unix.EEXIST) {
// Page already mapped (race with prefetch or concurrent fault).
return nil
}
return fmt.Errorf("uffdio_copy: %w", err)
}
return nil
}
// Prefetch proactively loads all guest memory pages in the background.
// It iterates over every page in every UFFD region and copies it from the
// diff file into guest memory via UFFDIO_COPY. Pages already loaded by
// on-demand faults return nil from faultPage (EEXIST handled internally).
// This eliminates the per-request latency caused by lazy page faulting
// after snapshot restore.
//
// The goroutine blocks on readyCh before reading the uffd fd and mapping
// fields (establishes happens-before with handle()). It uses an internal
// context independent of the caller's RPC context so it survives after the
// create/resume RPC returns. Stop() cancels and joins the goroutine.
func (s *Server) Prefetch() {
ctx, cancel := context.WithCancel(context.Background())
s.prefetchCancel = cancel
s.prefetchDone = make(chan struct{})
go func() {
defer close(s.prefetchDone)
// Wait for Firecracker to connect and send the uffd fd.
select {
case <-s.readyCh:
case <-ctx.Done():
return
}
uffdFd := s.uffdFd
mapping := s.mapping
if mapping == nil {
return
}
var total, errored int
for _, region := range mapping.Regions {
pageSize := region.PageSize
if pageSize == 0 {
continue
}
for off := uintptr(0); off < region.Size; off += pageSize {
if ctx.Err() != nil {
slog.Debug("uffd prefetch cancelled",
"pages", total, "errors", errored)
return
}
addr := region.BaseHostVirtAddr + off
memOffset := int64(off) + int64(region.Offset)
if err := s.faultPage(ctx, uffdFd, addr, memOffset, pageSize); err != nil {
errored++
} else {
total++
}
}
}
slog.Info("uffd prefetch complete",
"pages", total, "errors", errored)
}()
}
// DiffFileSource serves pages from a snapshot's compact diff file using
// the header's block mapping to resolve offsets.
type DiffFileSource struct {
header *snapshot.Header
// diffs maps build ID → open file handle for each generation's diff file.
diffs map[string]*os.File
}
// NewDiffFileSource creates a memory source backed by snapshot diff files.
// diffs maps build ID string to the file path of each generation's diff file.
func NewDiffFileSource(header *snapshot.Header, diffPaths map[string]string) (*DiffFileSource, error) {
diffs := make(map[string]*os.File, len(diffPaths))
for id, path := range diffPaths {
f, err := os.Open(path)
if err != nil {
// Close already opened files.
for _, opened := range diffs {
opened.Close()
}
return nil, fmt.Errorf("open diff file %s: %w", path, err)
}
diffs[id] = f
}
return &DiffFileSource{header: header, diffs: diffs}, nil
}
// ReadPage resolves a memory offset through the header mapping and reads
// the corresponding page from the correct generation's diff file.
func (s *DiffFileSource) ReadPage(ctx context.Context, offset int64, size int64) ([]byte, error) {
mappedOffset, _, buildID, err := s.header.GetShiftedMapping(ctx, offset)
if err != nil {
return nil, fmt.Errorf("resolve offset %d: %w", offset, err)
}
// uuid.Nil means zero-fill (empty page).
var nilUUID [16]byte
if *buildID == nilUUID {
return make([]byte, size), nil
}
f, ok := s.diffs[buildID.String()]
if !ok {
return nil, fmt.Errorf("no diff file for build %s", buildID)
}
buf := make([]byte, size)
n, err := f.ReadAt(buf, mappedOffset)
if err != nil && int64(n) < size {
return nil, fmt.Errorf("read diff at offset %d: %w", mappedOffset, err)
}
return buf, nil
}
// Close closes all open diff file handles.
func (s *DiffFileSource) Close() error {
var errs []error
for _, f := range s.diffs {
if err := f.Close(); err != nil {
errs = append(errs, err)
}
}
return errors.Join(errs...)
}

208
internal/vm/ch.go Normal file
View File

@ -0,0 +1,208 @@
package vm
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net"
"net/http"
)
// chClient talks to the Cloud Hypervisor HTTP API over a Unix socket.
type chClient struct {
http *http.Client
socketPath string
}
func newCHClient(socketPath string) *chClient {
return &chClient{
socketPath: socketPath,
http: &http.Client{
Transport: &http.Transport{
DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) {
var d net.Dialer
return d.DialContext(ctx, "unix", socketPath)
},
},
},
}
}
func (c *chClient) do(ctx context.Context, method, path string, body any) error {
var bodyReader io.Reader
if body != nil {
data, err := json.Marshal(body)
if err != nil {
return fmt.Errorf("marshal request body: %w", err)
}
bodyReader = bytes.NewReader(data)
}
req, err := http.NewRequestWithContext(ctx, method, "http://localhost"+path, bodyReader)
if err != nil {
return fmt.Errorf("create request: %w", err)
}
if body != nil {
req.Header.Set("Content-Type", "application/json")
}
resp, err := c.http.Do(req)
if err != nil {
return fmt.Errorf("%s %s: %w", method, path, err)
}
defer resp.Body.Close()
if resp.StatusCode >= 300 {
respBody, _ := io.ReadAll(resp.Body)
return fmt.Errorf("%s %s: status %d: %s", method, path, resp.StatusCode, string(respBody))
}
return nil
}
// --- CH API payload types ---
type chPayload struct {
Firmware string `json:"firmware,omitempty"`
Kernel string `json:"kernel"`
Cmdline string `json:"cmdline"`
}
type chCPUs struct {
BootVCPUs int `json:"boot_vcpus"`
MaxVCPUs int `json:"max_vcpus"`
}
type chMemory struct {
Size uint64 `json:"size"`
Shared bool `json:"shared,omitempty"`
HotplugSize uint64 `json:"hotplug_size,omitempty"`
HotplugMethod string `json:"hotplug_method,omitempty"`
}
type chDisk struct {
Path string `json:"path"`
Readonly bool `json:"readonly,omitempty"`
ImageType string `json:"image_type,omitempty"`
}
type chNet struct {
Tap string `json:"tap"`
MAC string `json:"mac"`
NumQs int `json:"num_queues,omitempty"`
QueueS int `json:"queue_size,omitempty"`
}
type chBalloon struct {
Size int64 `json:"size"`
DeflateOnOOM bool `json:"deflate_on_oom"`
FreePageRep bool `json:"free_page_reporting,omitempty"`
}
type chConsole struct {
Mode string `json:"mode"`
}
type chCreatePayload struct {
Payload chPayload `json:"payload"`
CPUs chCPUs `json:"cpus"`
Memory chMemory `json:"memory"`
Disks []chDisk `json:"disks"`
Net []chNet `json:"net"`
Balloon *chBalloon `json:"balloon,omitempty"`
Serial chConsole `json:"serial"`
Console chConsole `json:"console"`
}
// createVM sends the full VM configuration as a single payload.
func (c *chClient) createVM(ctx context.Context, cfg *VMConfig) error {
memBytes := uint64(cfg.MemoryMB) * 1024 * 1024
payload := chCreatePayload{
Payload: chPayload{
Kernel: cfg.KernelPath,
Cmdline: cfg.kernelArgs(),
},
CPUs: chCPUs{
BootVCPUs: cfg.VCPUs,
MaxVCPUs: cfg.VCPUs,
},
Memory: chMemory{
Size: memBytes,
Shared: true,
},
Disks: []chDisk{
{
Path: cfg.SandboxDir + "/rootfs.ext4",
ImageType: "Raw",
},
},
Net: []chNet{
{
Tap: cfg.TapDevice,
MAC: cfg.TapMAC,
},
},
Balloon: &chBalloon{
Size: 0,
DeflateOnOOM: true,
FreePageRep: true,
},
Serial: chConsole{
Mode: "Tty",
},
Console: chConsole{
Mode: "Off",
},
}
return c.do(ctx, http.MethodPut, "/api/v1/vm.create", payload)
}
// bootVM starts the VM after creation.
func (c *chClient) bootVM(ctx context.Context) error {
return c.do(ctx, http.MethodPut, "/api/v1/vm.boot", nil)
}
// pauseVM pauses the microVM.
func (c *chClient) pauseVM(ctx context.Context) error {
return c.do(ctx, http.MethodPut, "/api/v1/vm.pause", nil)
}
// resumeVM resumes a paused microVM.
func (c *chClient) resumeVM(ctx context.Context) error {
return c.do(ctx, http.MethodPut, "/api/v1/vm.resume", nil)
}
// snapshotVM creates a VM snapshot to the given directory.
func (c *chClient) snapshotVM(ctx context.Context, destURL string) error {
return c.do(ctx, http.MethodPut, "/api/v1/vm.snapshot", map[string]string{
"destination_url": destURL,
})
}
// restoreVM restores a VM from a snapshot via the API. Uses OnDemand memory
// restore mode for UFFD-based lazy page loading — only pages the guest
// actually touches are faulted in from disk.
func (c *chClient) restoreVM(ctx context.Context, sourceURL string) error {
return c.do(ctx, http.MethodPut, "/api/v1/vm.restore", map[string]any{
"source_url": sourceURL,
"memory_restore_mode": "OnDemand",
"resume": true,
})
}
// shutdownVMM cleanly shuts down the Cloud Hypervisor VMM process.
func (c *chClient) shutdownVMM(ctx context.Context) error {
return c.do(ctx, http.MethodPut, "/api/v1/vmm.shutdown", nil)
}
// resizeBalloon adjusts the balloon target at runtime.
// sizeBytes is memory to take FROM the guest (0 = give all back).
func (c *chClient) resizeBalloon(ctx context.Context, sizeBytes int64) error {
return c.do(ctx, http.MethodPut, "/api/v1/vm.resize", map[string]int64{
"desired_balloon": sizeBytes,
})
}

View File

@ -2,13 +2,12 @@ package vm
import "fmt" import "fmt"
// VMConfig holds the configuration for creating a Firecracker microVM. // VMConfig holds the configuration for creating a Cloud Hypervisor microVM.
type VMConfig struct { type VMConfig struct {
// SandboxID is the unique identifier for this sandbox (e.g., "cl-a1b2c3d4"). // SandboxID is the unique identifier for this sandbox (e.g., "cl-a1b2c3d4").
SandboxID string SandboxID string
// TemplateID is the template UUID string used to populate MMDS metadata // TemplateID is the template UUID string, passed to envd via PostInit.
// so that envd can read WRENN_TEMPLATE_ID from inside the guest.
TemplateID string TemplateID string
// KernelPath is the path to the uncompressed Linux kernel (vmlinux). // KernelPath is the path to the uncompressed Linux kernel (vmlinux).
@ -25,12 +24,12 @@ type VMConfig struct {
MemoryMB int MemoryMB int
// NetworkNamespace is the name of the network namespace to launch // NetworkNamespace is the name of the network namespace to launch
// Firecracker inside (e.g., "ns-1"). The namespace must already exist // Cloud Hypervisor inside (e.g., "ns-1"). The namespace must already exist
// with a TAP device configured. // with a TAP device configured.
NetworkNamespace string NetworkNamespace string
// TapDevice is the name of the TAP device inside the network namespace // TapDevice is the name of the TAP device inside the network namespace
// that Firecracker will attach to (e.g., "tap0"). // that Cloud Hypervisor will attach to (e.g., "tap0").
TapDevice string TapDevice string
// TapMAC is the MAC address for the TAP device. // TapMAC is the MAC address for the TAP device.
@ -45,19 +44,23 @@ type VMConfig struct {
// NetMask is the subnet mask for the guest network (e.g., "255.255.255.252"). // NetMask is the subnet mask for the guest network (e.g., "255.255.255.252").
NetMask string NetMask string
// FirecrackerBin is the path to the firecracker binary. // VMMBin is the path to the cloud-hypervisor binary.
FirecrackerBin string VMMBin string
// SocketPath is the path for the Firecracker API Unix socket. // SocketPath is the path for the Cloud Hypervisor API Unix socket.
SocketPath string SocketPath string
// SandboxDir is the tmpfs mount point for per-sandbox files inside the // SandboxDir is the tmpfs mount point for per-sandbox files inside the
// mount namespace (e.g., "/fc-vm"). // mount namespace (e.g., "/ch-vm").
SandboxDir string SandboxDir string
// InitPath is the path to the init process inside the guest. // InitPath is the path to the init process inside the guest.
// Defaults to "/sbin/init" if empty. // Defaults to "/sbin/init" if empty.
InitPath string InitPath string
// SnapshotDir is the path to the snapshot directory for restore.
// Only set when restoring from a snapshot.
SnapshotDir string
} }
func (c *VMConfig) applyDefaults() { func (c *VMConfig) applyDefaults() {
@ -67,14 +70,14 @@ func (c *VMConfig) applyDefaults() {
if c.MemoryMB == 0 { if c.MemoryMB == 0 {
c.MemoryMB = 512 c.MemoryMB = 512
} }
if c.FirecrackerBin == "" { if c.VMMBin == "" {
c.FirecrackerBin = "/usr/local/bin/firecracker" c.VMMBin = "/usr/local/bin/cloud-hypervisor"
} }
if c.SocketPath == "" { if c.SocketPath == "" {
c.SocketPath = fmt.Sprintf("/tmp/fc-%s.sock", c.SandboxID) c.SocketPath = fmt.Sprintf("/tmp/ch-%s.sock", c.SandboxID)
} }
if c.SandboxDir == "" { if c.SandboxDir == "" {
c.SandboxDir = "/tmp/fc-vm" c.SandboxDir = fmt.Sprintf("/tmp/ch-vm-%s", c.SandboxID)
} }
if c.TapDevice == "" { if c.TapDevice == "" {
c.TapDevice = "tap0" c.TapDevice = "tap0"
@ -95,7 +98,7 @@ func (c *VMConfig) kernelArgs() string {
) )
return fmt.Sprintf( return fmt.Sprintf(
"console=ttyS0 reboot=k panic=1 pci=off quiet loglevel=1 clocksource=kvm-clock init=%s %s", "console=ttyS0 root=/dev/vda rw reboot=k panic=1 quiet loglevel=1 init_on_free=1 clocksource=kvm-clock init=%s %s",
c.InitPath, ipArg, c.InitPath, ipArg,
) )
} }

View File

@ -1,202 +0,0 @@
package vm
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"net"
"net/http"
)
// fcClient talks to the Firecracker HTTP API over a Unix socket.
type fcClient struct {
http *http.Client
socketPath string
}
func newFCClient(socketPath string) *fcClient {
return &fcClient{
socketPath: socketPath,
http: &http.Client{
Transport: &http.Transport{
DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) {
var d net.Dialer
return d.DialContext(ctx, "unix", socketPath)
},
},
// No global timeout — callers pass context.Context with appropriate
// deadlines. A fixed 10s timeout was too short for snapshot/resume
// operations on large-memory VMs (20GB+ memfiles).
},
}
}
func (c *fcClient) do(ctx context.Context, method, path string, body any) error {
var bodyReader io.Reader
if body != nil {
data, err := json.Marshal(body)
if err != nil {
return fmt.Errorf("marshal request body: %w", err)
}
bodyReader = bytes.NewReader(data)
}
// The host in the URL is ignored for Unix sockets; we use "localhost" by convention.
req, err := http.NewRequestWithContext(ctx, method, "http://localhost"+path, bodyReader)
if err != nil {
return fmt.Errorf("create request: %w", err)
}
if body != nil {
req.Header.Set("Content-Type", "application/json")
}
resp, err := c.http.Do(req)
if err != nil {
return fmt.Errorf("%s %s: %w", method, path, err)
}
defer resp.Body.Close()
if resp.StatusCode >= 300 {
respBody, _ := io.ReadAll(resp.Body)
return fmt.Errorf("%s %s: status %d: %s", method, path, resp.StatusCode, string(respBody))
}
return nil
}
// setBootSource configures the kernel and boot args.
func (c *fcClient) setBootSource(ctx context.Context, kernelPath, bootArgs string) error {
return c.do(ctx, http.MethodPut, "/boot-source", map[string]string{
"kernel_image_path": kernelPath,
"boot_args": bootArgs,
})
}
// setRootfsDrive configures the root filesystem drive.
func (c *fcClient) setRootfsDrive(ctx context.Context, driveID, path string, readOnly bool) error {
return c.do(ctx, http.MethodPut, "/drives/"+driveID, map[string]any{
"drive_id": driveID,
"path_on_host": path,
"is_root_device": true,
"is_read_only": readOnly,
})
}
// setNetworkInterface configures a network interface attached to a TAP device.
// A tx_rate_limiter caps sustained guest→host throughput to prevent user
// application traffic from completely saturating the TAP device and starving
// envd control traffic (PTY, exec, file ops).
func (c *fcClient) setNetworkInterface(ctx context.Context, ifaceID, tapName, macAddr string) error {
return c.do(ctx, http.MethodPut, "/network-interfaces/"+ifaceID, map[string]any{
"iface_id": ifaceID,
"host_dev_name": tapName,
"guest_mac": macAddr,
"tx_rate_limiter": map[string]any{
"bandwidth": map[string]any{
"size": 209715200, // 200 MB/s sustained
"refill_time": 1000, // refill period: 1 second
"one_time_burst": 104857600, // 100 MB initial burst
},
},
})
}
// setMachineConfig configures vCPUs, memory, and other machine settings.
func (c *fcClient) setMachineConfig(ctx context.Context, vcpus, memMB int) error {
return c.do(ctx, http.MethodPut, "/machine-config", map[string]any{
"vcpu_count": vcpus,
"mem_size_mib": memMB,
"smt": false,
})
}
// setMMDSConfig enables MMDS V2 token-based access on the given network interface.
// Must be called before startVM.
func (c *fcClient) setMMDSConfig(ctx context.Context, ifaceID string) error {
return c.do(ctx, http.MethodPut, "/mmds/config", map[string]any{
"version": "V2",
"network_interfaces": []string{ifaceID},
})
}
// mmdsMetadata is the metadata payload written to the Firecracker MMDS store.
// envd reads this via PollForMMDSOpts to populate WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID.
type mmdsMetadata struct {
SandboxID string `json:"instanceID"`
TemplateID string `json:"envID"`
}
// setMMDS writes sandbox metadata to the Firecracker MMDS store.
// Can be called after the VM has started.
func (c *fcClient) setMMDS(ctx context.Context, sandboxID, templateID string) error {
return c.do(ctx, http.MethodPut, "/mmds", mmdsMetadata{
SandboxID: sandboxID,
TemplateID: templateID,
})
}
// setBalloon configures the Firecracker balloon device for dynamic memory
// management. deflateOnOom lets the guest reclaim balloon pages under memory
// pressure. statsInterval enables periodic stats via GET /balloon/statistics.
// Must be called before startVM.
func (c *fcClient) setBalloon(ctx context.Context, amountMiB int, deflateOnOom bool, statsIntervalS int) error {
return c.do(ctx, http.MethodPut, "/balloon", map[string]any{
"amount_mib": amountMiB,
"deflate_on_oom": deflateOnOom,
"stats_polling_interval_s": statsIntervalS,
})
}
// updateBalloon adjusts the balloon target at runtime.
func (c *fcClient) updateBalloon(ctx context.Context, amountMiB int) error {
return c.do(ctx, http.MethodPatch, "/balloon", map[string]any{
"amount_mib": amountMiB,
})
}
// startVM issues the InstanceStart action.
func (c *fcClient) startVM(ctx context.Context) error {
return c.do(ctx, http.MethodPut, "/actions", map[string]string{
"action_type": "InstanceStart",
})
}
// pauseVM pauses the microVM.
func (c *fcClient) pauseVM(ctx context.Context) error {
return c.do(ctx, http.MethodPatch, "/vm", map[string]string{
"state": "Paused",
})
}
// resumeVM resumes a paused microVM.
func (c *fcClient) resumeVM(ctx context.Context) error {
return c.do(ctx, http.MethodPatch, "/vm", map[string]string{
"state": "Resumed",
})
}
// createSnapshot creates a VM snapshot.
// snapshotType is "Full" (all memory) or "Diff" (only dirty pages since last resume).
func (c *fcClient) createSnapshot(ctx context.Context, snapPath, memPath, snapshotType string) error {
return c.do(ctx, http.MethodPut, "/snapshot/create", map[string]any{
"snapshot_type": snapshotType,
"snapshot_path": snapPath,
"mem_file_path": memPath,
})
}
// loadSnapshotWithUffd loads a VM snapshot using a UFFD socket for
// lazy memory loading. Firecracker will connect to the socket and
// send the uffd fd + memory region mappings.
func (c *fcClient) loadSnapshotWithUffd(ctx context.Context, snapPath, uffdSocketPath string) error {
return c.do(ctx, http.MethodPut, "/snapshot/load", map[string]any{
"snapshot_path": snapPath,
"resume_vm": false,
"mem_backend": map[string]any{
"backend_type": "Uffd",
"backend_path": uffdSocketPath,
},
})
}

View File

@ -9,14 +9,14 @@ import (
"time" "time"
) )
// VM represents a running Firecracker microVM. // VM represents a running Cloud Hypervisor microVM.
type VM struct { type VM struct {
Config VMConfig Config VMConfig
process *process process *process
client *fcClient client *chClient
} }
// Manager handles the lifecycle of Firecracker microVMs. // Manager handles the lifecycle of Cloud Hypervisor microVMs.
type Manager struct { type Manager struct {
mu sync.RWMutex mu sync.RWMutex
// vms tracks running VMs by sandbox ID. // vms tracks running VMs by sandbox ID.
@ -30,7 +30,7 @@ func NewManager() *Manager {
} }
} }
// Create boots a new Firecracker microVM with the given configuration. // Create boots a new Cloud Hypervisor microVM with the given configuration.
// The network namespace and TAP device must already be set up. // The network namespace and TAP device must already be set up.
func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) { func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
cfg.applyDefaults() cfg.applyDefaults()
@ -38,7 +38,6 @@ func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
return nil, fmt.Errorf("invalid config: %w", err) return nil, fmt.Errorf("invalid config: %w", err)
} }
// Clean up any leftover socket from a previous run.
os.Remove(cfg.SocketPath) os.Remove(cfg.SocketPath)
slog.Info("creating VM", slog.Info("creating VM",
@ -47,8 +46,8 @@ func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
"memory_mb", cfg.MemoryMB, "memory_mb", cfg.MemoryMB,
) )
// Step 1: Launch the Firecracker process. // Step 1: Launch the Cloud Hypervisor process.
proc, err := startProcess(ctx, &cfg) proc, err := startProcess(&cfg)
if err != nil { if err != nil {
return nil, fmt.Errorf("start process: %w", err) return nil, fmt.Errorf("start process: %w", err)
} }
@ -59,25 +58,18 @@ func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
return nil, fmt.Errorf("wait for socket: %w", err) return nil, fmt.Errorf("wait for socket: %w", err)
} }
// Step 3: Configure the VM via the Firecracker API. // Step 3: Configure and boot the VM via a single API call.
client := newFCClient(cfg.SocketPath) client := newCHClient(cfg.SocketPath)
if err := configureVM(ctx, client, &cfg); err != nil { if err := client.createVM(ctx, &cfg); err != nil {
_ = proc.stop() _ = proc.stop()
return nil, fmt.Errorf("configure VM: %w", err) return nil, fmt.Errorf("create VM config: %w", err)
} }
// Step 4: Start the VM. // Step 4: Boot the VM.
if err := client.startVM(ctx); err != nil { if err := client.bootVM(ctx); err != nil {
_ = proc.stop() _ = proc.stop()
return nil, fmt.Errorf("start VM: %w", err) return nil, fmt.Errorf("boot VM: %w", err)
}
// Step 5: Push sandbox metadata into MMDS so envd can read
// WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID from inside the guest.
if err := client.setMMDS(ctx, cfg.SandboxID, cfg.TemplateID); err != nil {
_ = proc.stop()
return nil, fmt.Errorf("set MMDS metadata: %w", err)
} }
vm := &VM{ vm := &VM{
@ -95,46 +87,6 @@ func (m *Manager) Create(ctx context.Context, cfg VMConfig) (*VM, error) {
return vm, nil return vm, nil
} }
// configureVM sends the configuration to Firecracker via its HTTP API.
func configureVM(ctx context.Context, client *fcClient, cfg *VMConfig) error {
// Boot source (kernel + args)
if err := client.setBootSource(ctx, cfg.KernelPath, cfg.kernelArgs()); err != nil {
return fmt.Errorf("set boot source: %w", err)
}
// Root drive — use the symlink path inside the mount namespace so that
// snapshots record a stable path that works on restore.
rootfsSymlink := cfg.SandboxDir + "/rootfs.ext4"
if err := client.setRootfsDrive(ctx, "rootfs", rootfsSymlink, false); err != nil {
return fmt.Errorf("set rootfs drive: %w", err)
}
// Network interface
if err := client.setNetworkInterface(ctx, "eth0", cfg.TapDevice, cfg.TapMAC); err != nil {
return fmt.Errorf("set network interface: %w", err)
}
// Machine config (vCPUs + memory)
if err := client.setMachineConfig(ctx, cfg.VCPUs, cfg.MemoryMB); err != nil {
return fmt.Errorf("set machine config: %w", err)
}
// Balloon device — allows the host to reclaim unused guest memory.
// Start with 0 (no inflation). deflate_on_oom lets the guest reclaim
// balloon pages under memory pressure. Stats interval enables monitoring.
if err := client.setBalloon(ctx, 0, true, 5); err != nil {
slog.Warn("set balloon failed (non-fatal, VM will run without memory reclaim)", "error", err)
}
// MMDS config — enable V2 token access on eth0 so that envd can read
// WRENN_SANDBOX_ID and WRENN_TEMPLATE_ID from inside the guest.
if err := client.setMMDSConfig(ctx, "eth0"); err != nil {
return fmt.Errorf("set MMDS config: %w", err)
}
return nil
}
// Pause pauses a running VM. // Pause pauses a running VM.
func (m *Manager) Pause(ctx context.Context, sandboxID string) error { func (m *Manager) Pause(ctx context.Context, sandboxID string) error {
m.mu.RLock() m.mu.RLock()
@ -179,7 +131,8 @@ func (m *Manager) UpdateBalloon(ctx context.Context, sandboxID string, amountMiB
return fmt.Errorf("VM not found: %s", sandboxID) return fmt.Errorf("VM not found: %s", sandboxID)
} }
return vm.client.updateBalloon(ctx, amountMiB) sizeBytes := int64(amountMiB) * 1024 * 1024
return vm.client.resizeBalloon(ctx, sizeBytes)
} }
// Destroy stops and cleans up a VM. // Destroy stops and cleans up a VM.
@ -190,26 +143,34 @@ func (m *Manager) Destroy(ctx context.Context, sandboxID string) error {
m.mu.Unlock() m.mu.Unlock()
return fmt.Errorf("VM not found: %s", sandboxID) return fmt.Errorf("VM not found: %s", sandboxID)
} }
delete(m.vms, sandboxID)
m.mu.Unlock() m.mu.Unlock()
slog.Info("destroying VM", "sandbox", sandboxID) slog.Info("destroying VM", "sandbox", sandboxID)
// Stop the Firecracker process. // Try clean shutdown first, fall back to process kill.
shutdownCtx, shutdownCancel := context.WithTimeout(ctx, 5*time.Second)
if err := vm.client.shutdownVMM(shutdownCtx); err != nil {
slog.Debug("clean VMM shutdown failed, killing process", "sandbox", sandboxID, "error", err)
}
shutdownCancel()
if err := vm.process.stop(); err != nil { if err := vm.process.stop(); err != nil {
slog.Warn("error stopping process", "sandbox", sandboxID, "error", err) slog.Warn("error stopping process", "sandbox", sandboxID, "error", err)
} }
// Clean up the API socket.
os.Remove(vm.Config.SocketPath) os.Remove(vm.Config.SocketPath)
m.mu.Lock()
delete(m.vms, sandboxID)
m.mu.Unlock()
slog.Info("VM destroyed", "sandbox", sandboxID) slog.Info("VM destroyed", "sandbox", sandboxID)
return nil return nil
} }
// Snapshot creates a VM snapshot. The VM must already be paused. // Snapshot creates a VM snapshot. The VM must already be paused.
// snapshotType is "Full" (all memory) or "Diff" (only dirty pages since last resume). // destURL is the file:// URL to the snapshot directory.
func (m *Manager) Snapshot(ctx context.Context, sandboxID, snapPath, memPath, snapshotType string) error { func (m *Manager) Snapshot(ctx context.Context, sandboxID, snapshotDir string) error {
m.mu.RLock() m.mu.RLock()
vm, ok := m.vms[sandboxID] vm, ok := m.vms[sandboxID]
m.mu.RUnlock() m.mu.RUnlock()
@ -217,29 +178,35 @@ func (m *Manager) Snapshot(ctx context.Context, sandboxID, snapPath, memPath, sn
return fmt.Errorf("VM not found: %s", sandboxID) return fmt.Errorf("VM not found: %s", sandboxID)
} }
if err := vm.client.createSnapshot(ctx, snapPath, memPath, snapshotType); err != nil { destURL := "file://" + snapshotDir
if err := vm.client.snapshotVM(ctx, destURL); err != nil {
return fmt.Errorf("create snapshot: %w", err) return fmt.Errorf("create snapshot: %w", err)
} }
slog.Info("VM snapshot created", "sandbox", sandboxID, "snap_path", snapPath, "type", snapshotType) slog.Info("VM snapshot created", "sandbox", sandboxID, "snapshot_dir", snapshotDir)
return nil return nil
} }
// CreateFromSnapshot boots a new Firecracker VM by loading a snapshot // CreateFromSnapshot boots a new Cloud Hypervisor VM by restoring from a
// using UFFD for lazy memory loading. The network namespace and TAP // snapshot directory. The network namespace and TAP device must already be set up.
// device must already be set up.
// //
// No boot resources (kernel, drives, machine config) are configured — // A bare CH process is started first, then the restore is performed via the API
// the snapshot carries all that state. The rootfs path recorded in the // with memory_restore_mode=OnDemand for UFFD-based lazy page loading. This means
// snapshot is resolved via a stable symlink at SandboxDir/rootfs.ext4 // only pages the guest actually touches are faulted in from disk — a 16GB template
// inside the mount namespace (created by the start script in jailer.go). // with 2GB active working set only loads ~2GB into RAM at restore time.
//
// The restore API also sets resume=true, so the VM starts running immediately
// without a separate resume call.
//
// The rootfs path recorded in the snapshot is resolved via a stable symlink at
// SandboxDir/rootfs.ext4 inside the mount namespace.
// //
// The sequence is: // The sequence is:
// 1. Start FC process in mount+network namespace (creates tmpfs + rootfs symlink) // 1. Start bare CH process in mount+network namespace
// 2. Wait for API socket // 2. Wait for API socket
// 3. Load snapshot with UFFD backend // 3. Restore VM via API (OnDemand memory + auto-resume)
// 4. Resume VM execution func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapshotDir string) (*VM, error) {
func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapPath, uffdSocketPath string) (*VM, error) { cfg.SnapshotDir = snapshotDir
cfg.applyDefaults() cfg.applyDefaults()
if err := cfg.validate(); err != nil { if err := cfg.validate(); err != nil {
return nil, fmt.Errorf("invalid config: %w", err) return nil, fmt.Errorf("invalid config: %w", err)
@ -249,14 +216,11 @@ func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapPath
slog.Info("restoring VM from snapshot", slog.Info("restoring VM from snapshot",
"sandbox", cfg.SandboxID, "sandbox", cfg.SandboxID,
"snap_path", snapPath, "snapshot_dir", snapshotDir,
) )
// Step 1: Launch the Firecracker process. // Step 1: Launch bare CH process (no --restore).
// The start script creates a tmpfs at SandboxDir and symlinks proc, err := startProcessForRestore(&cfg)
// rootfs.ext4 → cfg.RootfsPath, so the snapshot's recorded rootfs
// path (/fc-vm/rootfs.ext4) resolves to the new clone.
proc, err := startProcess(ctx, &cfg)
if err != nil { if err != nil {
return nil, fmt.Errorf("start process: %w", err) return nil, fmt.Errorf("start process: %w", err)
} }
@ -267,26 +231,13 @@ func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapPath
return nil, fmt.Errorf("wait for socket: %w", err) return nil, fmt.Errorf("wait for socket: %w", err)
} }
client := newFCClient(cfg.SocketPath) client := newCHClient(cfg.SocketPath)
// Step 3: Load the snapshot with UFFD backend. // Step 3: Restore via API with OnDemand memory + auto-resume.
// No boot resources are configured — the snapshot carries kernel, sourceURL := "file://" + snapshotDir
// drive, network, and machine config state. if err := client.restoreVM(ctx, sourceURL); err != nil {
if err := client.loadSnapshotWithUffd(ctx, snapPath, uffdSocketPath); err != nil {
_ = proc.stop() _ = proc.stop()
return nil, fmt.Errorf("load snapshot: %w", err) return nil, fmt.Errorf("restore VM: %w", err)
}
// Step 4: Resume the VM.
if err := client.resumeVM(ctx); err != nil {
_ = proc.stop()
return nil, fmt.Errorf("resume VM: %w", err)
}
// Step 5: Push sandbox metadata into MMDS.
if err := client.setMMDS(ctx, cfg.SandboxID, cfg.TemplateID); err != nil {
_ = proc.stop()
return nil, fmt.Errorf("set MMDS metadata: %w", err)
} }
vm := &VM{ vm := &VM{
@ -304,11 +255,15 @@ func (m *Manager) CreateFromSnapshot(ctx context.Context, cfg VMConfig, snapPath
} }
// PID returns the process ID of the unshare wrapper process. // PID returns the process ID of the unshare wrapper process.
// The actual Firecracker process is a direct child of this PID.
func (v *VM) PID() int { func (v *VM) PID() int {
return v.process.cmd.Process.Pid return v.process.cmd.Process.Pid
} }
// Exited returns a channel that is closed when the VM process exits.
func (v *VM) Exited() <-chan struct{} {
return v.process.exited()
}
// Get returns a running VM by sandbox ID. // Get returns a running VM by sandbox ID.
func (m *Manager) Get(sandboxID string) (*VM, bool) { func (m *Manager) Get(sandboxID string) (*VM, bool) {
m.mu.RLock() m.mu.RLock()
@ -317,7 +272,7 @@ func (m *Manager) Get(sandboxID string) (*VM, bool) {
return vm, ok return vm, ok
} }
// waitForSocket polls for the Firecracker API socket to appear on disk. // waitForSocket polls for the Cloud Hypervisor API socket to appear on disk.
func waitForSocket(ctx context.Context, socketPath string, proc *process) error { func waitForSocket(ctx context.Context, socketPath string, proc *process) error {
ticker := time.NewTicker(10 * time.Millisecond) ticker := time.NewTicker(10 * time.Millisecond)
defer ticker.Stop() defer ticker.Stop()
@ -329,7 +284,7 @@ func waitForSocket(ctx context.Context, socketPath string, proc *process) error
case <-ctx.Done(): case <-ctx.Done():
return ctx.Err() return ctx.Err()
case <-proc.exited(): case <-proc.exited():
return fmt.Errorf("firecracker process exited before socket was ready") return fmt.Errorf("cloud-hypervisor process exited before socket was ready")
case <-timeout: case <-timeout:
return fmt.Errorf("timed out waiting for API socket at %s", socketPath) return fmt.Errorf("timed out waiting for API socket at %s", socketPath)
case <-ticker.C: case <-ticker.C:

View File

@ -10,7 +10,7 @@ import (
"time" "time"
) )
// process represents a running Firecracker process with mount and network // process represents a running Cloud Hypervisor process with mount and network
// namespace isolation. // namespace isolation.
type process struct { type process struct {
cmd *exec.Cmd cmd *exec.Cmd
@ -20,33 +20,42 @@ type process struct {
exitErr error exitErr error
} }
// startProcess launches the Firecracker binary inside an isolated mount namespace // startProcess launches the Cloud Hypervisor binary inside an isolated mount
// and the specified network namespace. The launch sequence: // namespace and the specified network namespace. Used for fresh boot (no
// snapshot). The launch sequence:
// //
// 1. unshare -m: creates a private mount namespace // 1. unshare -m: creates a private mount namespace
// 2. mount --make-rprivate /: prevents mount propagation to host // 2. mount --make-rprivate /: prevents mount propagation to host
// 3. mount tmpfs at SandboxDir: ephemeral workspace for this VM // 3. mount tmpfs at SandboxDir: ephemeral workspace for this VM
// 4. symlink kernel and rootfs into SandboxDir // 4. symlink kernel and rootfs into SandboxDir
// 5. ip netns exec <ns>: enters the network namespace where TAP is configured // 5. ip netns exec <ns>: enters the network namespace where TAP is configured
// 6. exec firecracker with the API socket path // 6. exec cloud-hypervisor with the API socket path
func startProcess(ctx context.Context, cfg *VMConfig) (*process, error) { func startProcess(cfg *VMConfig) (*process, error) {
// Use a background context for the long-lived Firecracker process.
// The request context (ctx) is only used for the startup phase — we must
// not tie the VM's lifetime to the HTTP request that created it.
execCtx, cancel := context.WithCancel(context.Background())
script := buildStartScript(cfg) script := buildStartScript(cfg)
return launchScript(script, cfg)
}
// startProcessForRestore launches a bare Cloud Hypervisor process (no --restore).
// The restore is performed via the API after the socket is ready, which allows
// passing memory_restore_mode=OnDemand for UFFD lazy paging.
func startProcessForRestore(cfg *VMConfig) (*process, error) {
script := buildRestoreScript(cfg)
return launchScript(script, cfg)
}
func launchScript(script string, cfg *VMConfig) (*process, error) {
execCtx, cancel := context.WithCancel(context.Background())
cmd := exec.CommandContext(execCtx, "unshare", "-m", "--", "bash", "-c", script) cmd := exec.CommandContext(execCtx, "unshare", "-m", "--", "bash", "-c", script)
cmd.SysProcAttr = &syscall.SysProcAttr{ cmd.SysProcAttr = &syscall.SysProcAttr{
Setsid: true, // new session so signals don't propagate from parent Setsid: true,
} }
cmd.Stdout = os.Stdout cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil { if err := cmd.Start(); err != nil {
cancel() cancel()
return nil, fmt.Errorf("start firecracker process: %w", err) return nil, fmt.Errorf("start cloud-hypervisor process: %w", err)
} }
p := &process{ p := &process{
@ -60,7 +69,7 @@ func startProcess(ctx context.Context, cfg *VMConfig) (*process, error) {
close(p.exitCh) close(p.exitCh)
}() }()
slog.Info("firecracker process started", slog.Info("cloud-hypervisor process started",
"pid", cmd.Process.Pid, "pid", cmd.Process.Pid,
"sandbox", cfg.SandboxID, "sandbox", cfg.SandboxID,
) )
@ -68,35 +77,56 @@ func startProcess(ctx context.Context, cfg *VMConfig) (*process, error) {
return p, nil return p, nil
} }
// buildStartScript generates the bash script that sets up the mount namespace, // buildStartScript generates the bash script for fresh boot: sets up mount
// symlinks kernel/rootfs, and execs Firecracker inside the network namespace. // namespace, symlinks kernel/rootfs, and execs Cloud Hypervisor.
func buildStartScript(cfg *VMConfig) string { func buildStartScript(cfg *VMConfig) string {
return fmt.Sprintf(` return fmt.Sprintf(`
set -euo pipefail set -euo pipefail
# Prevent mount propagation to the host
mount --make-rprivate / mount --make-rprivate /
# Create ephemeral tmpfs workspace
mkdir -p %[1]s mkdir -p %[1]s
mount -t tmpfs tmpfs %[1]s mount -t tmpfs tmpfs %[1]s
# Symlink kernel and rootfs into the workspace
ln -s %[2]s %[1]s/vmlinux ln -s %[2]s %[1]s/vmlinux
ln -s %[3]s %[1]s/rootfs.ext4 ln -s %[3]s %[1]s/rootfs.ext4
# Launch Firecracker inside the network namespace exec ip netns exec %[4]s %[5]s --api-socket path=%[6]s
exec ip netns exec %[4]s %[5]s --api-sock %[6]s
`, `,
cfg.SandboxDir, // 1 cfg.SandboxDir, // 1
cfg.KernelPath, // 2 cfg.KernelPath, // 2
cfg.RootfsPath, // 3 cfg.RootfsPath, // 3
cfg.NetworkNamespace, // 4 cfg.NetworkNamespace, // 4
cfg.FirecrackerBin, // 5 cfg.VMMBin, // 5
cfg.SocketPath, // 6 cfg.SocketPath, // 6
) )
} }
// buildRestoreScript generates the bash script for snapshot restore: sets up
// mount namespace, symlinks rootfs, and starts a bare Cloud Hypervisor process.
// The actual restore is done via the API (PUT /vm.restore) after the socket is
// ready, which enables memory_restore_mode=OnDemand for UFFD lazy paging.
func buildRestoreScript(cfg *VMConfig) string {
return fmt.Sprintf(`
set -euo pipefail
mount --make-rprivate /
mkdir -p %[1]s
mount -t tmpfs tmpfs %[1]s
ln -s %[2]s %[1]s/rootfs.ext4
exec ip netns exec %[3]s %[4]s --api-socket path=%[5]s
`,
cfg.SandboxDir, // 1
cfg.RootfsPath, // 2
cfg.NetworkNamespace, // 3
cfg.VMMBin, // 4
cfg.SocketPath, // 5
)
}
// stop sends SIGTERM and waits for the process to exit. If it doesn't exit // stop sends SIGTERM and waits for the process to exit. If it doesn't exit
// within 10 seconds, SIGKILL is sent. // within 10 seconds, SIGKILL is sent.
func (p *process) stop() error { func (p *process) stop() error {
@ -104,7 +134,6 @@ func (p *process) stop() error {
return nil return nil
} }
// Send SIGTERM to the process group (negative PID).
if err := syscall.Kill(-p.cmd.Process.Pid, syscall.SIGTERM); err != nil { if err := syscall.Kill(-p.cmd.Process.Pid, syscall.SIGTERM); err != nil {
slog.Debug("sigterm failed, process may have exited", "error", err) slog.Debug("sigterm failed, process may have exited", "error", err)
} }
@ -113,7 +142,7 @@ func (p *process) stop() error {
case <-p.exitCh: case <-p.exitCh:
return nil return nil
case <-time.After(10 * time.Second): case <-time.After(10 * time.Second):
slog.Warn("firecracker did not exit after SIGTERM, sending SIGKILL") slog.Warn("cloud-hypervisor did not exit after SIGTERM, sending SIGKILL")
if err := syscall.Kill(-p.cmd.Process.Pid, syscall.SIGKILL); err != nil { if err := syscall.Kill(-p.cmd.Process.Pid, syscall.SIGKILL); err != nil {
slog.Debug("sigkill failed", "error", err) slog.Debug("sigkill failed", "error", err)
} }

View File

@ -177,8 +177,14 @@ func Run(opts ...Option) {
Config: cfg, Config: cfg,
} }
// Host monitor (safety-net reconciliation every 5 minutes).
// Primary state sync is push-based (host agent callbacks + CP background
// goroutines). The monitor acts as a fallback for missed events, host death
// detection, and transient status resolution.
monitor := api.NewHostMonitor(queries, hostPool, al, 5*time.Minute)
// API server. // API server.
srv := api.New(queries, hostPool, hostScheduler, pool, rdb, []byte(cfg.JWTSecret), oauthRegistry, cfg.OAuthRedirectURL, ca, al, channelSvc, mailer, o.extensions, sctx, o.version) srv := api.New(queries, hostPool, hostScheduler, pool, rdb, []byte(cfg.JWTSecret), oauthRegistry, cfg.OAuthRedirectURL, ca, al, channelSvc, mailer, o.extensions, sctx, monitor, o.version)
// Start template build workers (2 concurrent). // Start template build workers (2 concurrent).
stopBuildWorkers := srv.BuildSvc.StartWorkers(ctx, 2) stopBuildWorkers := srv.BuildSvc.StartWorkers(ctx, 2)
@ -187,8 +193,11 @@ func Run(opts ...Option) {
// Start channel event dispatcher. // Start channel event dispatcher.
channelDispatcher.Start(ctx) channelDispatcher.Start(ctx)
// Start host monitor (passive + active reconciliation every 30s). // Start sandbox event consumer (processes lifecycle events from Redis stream).
monitor := api.NewHostMonitor(queries, hostPool, al, 15*time.Second) sandboxEventConsumer := api.NewSandboxEventConsumer(rdb, queries, al)
sandboxEventConsumer.Start(ctx)
// Start host monitor loop.
monitor.Start(ctx) monitor.Start(ctx)
// Hard-delete accounts that have been soft-deleted for more than 15 days (runs every 24h). // Hard-delete accounts that have been soft-deleted for more than 15 days (runs every 24h).
@ -246,7 +255,7 @@ func Run(opts ...Option) {
// Start extension background workers. // Start extension background workers.
for _, ext := range o.extensions { for _, ext := range o.extensions {
for _, worker := range ext.BackgroundWorkers(sctx) { for _, worker := range ext.BackgroundWorkers(sctx) {
worker(ctx) go worker(ctx)
} }
} }

View File

@ -375,7 +375,7 @@ const markSandboxesMissingByHost = `-- name: MarkSandboxesMissingByHost :exec
UPDATE sandboxes UPDATE sandboxes
SET status = 'missing', SET status = 'missing',
last_updated = NOW() last_updated = NOW()
WHERE host_id = $1 AND status IN ('running', 'starting', 'pending') WHERE host_id = $1 AND status IN ('running', 'starting', 'pending', 'pausing', 'resuming', 'stopping')
` `
// Called when the host monitor marks a host unreachable. // Called when the host monitor marks a host unreachable.
@ -470,6 +470,61 @@ func (q *Queries) UpdateSandboxRunning(ctx context.Context, arg UpdateSandboxRun
return i, err return i, err
} }
const updateSandboxRunningIf = `-- name: UpdateSandboxRunningIf :one
UPDATE sandboxes
SET status = 'running',
host_ip = $3,
guest_ip = $4,
started_at = $5,
last_active_at = $5,
last_updated = NOW()
WHERE id = $1 AND status = $2
RETURNING id, team_id, host_id, template, status, vcpus, memory_mb, timeout_sec, disk_size_mb, guest_ip, host_ip, created_at, started_at, last_active_at, last_updated, template_id, template_team_id, metadata
`
type UpdateSandboxRunningIfParams struct {
ID pgtype.UUID `json:"id"`
Status string `json:"status"`
HostIp string `json:"host_ip"`
GuestIp string `json:"guest_ip"`
StartedAt pgtype.Timestamptz `json:"started_at"`
}
// Conditionally transition a sandbox to running only if the current status
// matches the expected value. Prevents races where a user destroys a sandbox
// while the create/resume goroutine is still in-flight.
func (q *Queries) UpdateSandboxRunningIf(ctx context.Context, arg UpdateSandboxRunningIfParams) (Sandbox, error) {
row := q.db.QueryRow(ctx, updateSandboxRunningIf,
arg.ID,
arg.Status,
arg.HostIp,
arg.GuestIp,
arg.StartedAt,
)
var i Sandbox
err := row.Scan(
&i.ID,
&i.TeamID,
&i.HostID,
&i.Template,
&i.Status,
&i.Vcpus,
&i.MemoryMb,
&i.TimeoutSec,
&i.DiskSizeMb,
&i.GuestIp,
&i.HostIp,
&i.CreatedAt,
&i.StartedAt,
&i.LastActiveAt,
&i.LastUpdated,
&i.TemplateID,
&i.TemplateTeamID,
&i.Metadata,
)
return i, err
}
const updateSandboxStatus = `-- name: UpdateSandboxStatus :one const updateSandboxStatus = `-- name: UpdateSandboxStatus :one
UPDATE sandboxes UPDATE sandboxes
SET status = $2, SET status = $2,
@ -508,3 +563,46 @@ func (q *Queries) UpdateSandboxStatus(ctx context.Context, arg UpdateSandboxStat
) )
return i, err return i, err
} }
const updateSandboxStatusIf = `-- name: UpdateSandboxStatusIf :one
UPDATE sandboxes
SET status = $3,
last_updated = NOW()
WHERE id = $1 AND status = $2
RETURNING id, team_id, host_id, template, status, vcpus, memory_mb, timeout_sec, disk_size_mb, guest_ip, host_ip, created_at, started_at, last_active_at, last_updated, template_id, template_team_id, metadata
`
type UpdateSandboxStatusIfParams struct {
ID pgtype.UUID `json:"id"`
Status string `json:"status"`
Status_2 string `json:"status_2"`
}
// Atomically update status only when the current status matches the expected value.
// Prevents background goroutines from overwriting a status that has since changed
// (e.g. user destroyed a sandbox while Create was in-flight).
func (q *Queries) UpdateSandboxStatusIf(ctx context.Context, arg UpdateSandboxStatusIfParams) (Sandbox, error) {
row := q.db.QueryRow(ctx, updateSandboxStatusIf, arg.ID, arg.Status, arg.Status_2)
var i Sandbox
err := row.Scan(
&i.ID,
&i.TeamID,
&i.HostID,
&i.Template,
&i.Status,
&i.Vcpus,
&i.MemoryMb,
&i.TimeoutSec,
&i.DiskSizeMb,
&i.GuestIp,
&i.HostIp,
&i.CreatedAt,
&i.StartedAt,
&i.LastActiveAt,
&i.LastUpdated,
&i.TemplateID,
&i.TemplateTeamID,
&i.Metadata,
)
return i, err
}

View File

@ -282,69 +282,11 @@ func (s *BuildService) executeBuild(ctx context.Context, buildIDStr string) {
return return
} }
// Pick a platform host and create a sandbox. agent, sandboxIDStr, sandboxMetadata, err := s.provisionBuildSandbox(buildCtx, buildID, buildIDStr, build, log)
host, err := s.Scheduler.SelectHost(buildCtx, id.PlatformTeamID, false, build.MemoryMb, 5120)
if err != nil { if err != nil {
s.failBuild(buildCtx, buildID, fmt.Sprintf("no host available: %v", err))
return return
} }
log = log.With("sandbox_id", sandboxIDStr)
agent, err := s.Pool.GetForHost(host)
if err != nil {
s.failBuild(buildCtx, buildID, fmt.Sprintf("agent client error: %v", err))
return
}
sandboxID := id.NewSandboxID()
sandboxIDStr := id.FormatSandboxID(sandboxID)
log = log.With("sandbox_id", sandboxIDStr, "host_id", id.FormatHostID(host.ID))
// Resolve the base template to UUIDs. "minimal" is the zero sentinel.
baseTeamID := id.PlatformTeamID
baseTemplateID := id.MinimalTemplateID
if build.BaseTemplate != "minimal" {
baseTmpl, err := s.DB.GetPlatformTemplateByName(buildCtx, build.BaseTemplate)
if err != nil {
s.failBuild(buildCtx, buildID, fmt.Sprintf("base template %q not found: %v", build.BaseTemplate, err))
return
}
baseTeamID = baseTmpl.TeamID
baseTemplateID = baseTmpl.ID
}
resp, err := agent.CreateSandbox(buildCtx, connect.NewRequest(&pb.CreateSandboxRequest{
SandboxId: sandboxIDStr,
Template: build.BaseTemplate,
TeamId: id.UUIDString(baseTeamID),
TemplateId: id.UUIDString(baseTemplateID),
Vcpus: build.Vcpus,
MemoryMb: build.MemoryMb,
TimeoutSec: 0, // no auto-pause for builds
DiskSizeMb: 5120, // 5 GB for template builds
}))
if err != nil {
s.failBuild(buildCtx, buildID, fmt.Sprintf("create sandbox failed: %v", err))
return
}
// Capture sandbox metadata (envd/kernel/firecracker/agent versions).
sandboxMetadata := resp.Msg.Metadata
// Record sandbox/host association.
_ = s.DB.UpdateBuildSandbox(buildCtx, db.UpdateBuildSandboxParams{
ID: buildID,
SandboxID: sandboxID,
HostID: host.ID,
})
// Upload and extract build archive if provided.
archive := s.takeArchive(buildIDStr)
if len(archive) > 0 {
if err := s.uploadAndExtractArchive(buildCtx, agent, sandboxIDStr, archive, buildIDStr); err != nil {
s.destroySandbox(buildCtx, agent, sandboxIDStr)
s.failBuild(buildCtx, buildID, fmt.Sprintf("archive upload failed: %v", err))
return
}
}
// Parse recipe steps. preBuildCmds and postBuildCmds are hardcoded and always // Parse recipe steps. preBuildCmds and postBuildCmds are hardcoded and always
// valid; panic on error is appropriate here since it would be a programmer mistake. // valid; panic on error is appropriate here since it would be a programmer mistake.
@ -435,81 +377,162 @@ func (s *BuildService) executeBuild(ctx context.Context, buildIDStr string) {
} }
} }
// Healthcheck or direct snapshot. // Finalize: healthcheck/snapshot/flatten → persist template → mark success.
s.finalizeBuild(buildCtx, buildID, build, agent, sandboxIDStr, templateDefaultUser, templateDefaultEnv, sandboxMetadata, log)
}
// provisionBuildSandbox picks a host, creates a sandbox, and uploads the build
// archive. On failure it calls failBuild and returns an error.
func (s *BuildService) provisionBuildSandbox(
ctx context.Context,
buildID pgtype.UUID,
buildIDStr string,
build db.TemplateBuild,
log *slog.Logger,
) (buildAgentClient, string, map[string]string, error) {
host, err := s.Scheduler.SelectHost(ctx, id.PlatformTeamID, false, build.MemoryMb, 5120)
if err != nil {
s.failBuild(ctx, buildID, fmt.Sprintf("no host available: %v", err))
return nil, "", nil, err
}
agent, err := s.Pool.GetForHost(host)
if err != nil {
s.failBuild(ctx, buildID, fmt.Sprintf("agent client error: %v", err))
return nil, "", nil, err
}
sandboxID := id.NewSandboxID()
sandboxIDStr := id.FormatSandboxID(sandboxID)
log.Info("provisioning build sandbox", "sandbox_id", sandboxIDStr, "host_id", id.FormatHostID(host.ID))
baseTeamID := id.PlatformTeamID
baseTemplateID := id.MinimalTemplateID
if build.BaseTemplate != "minimal" {
baseTmpl, err := s.DB.GetPlatformTemplateByName(ctx, build.BaseTemplate)
if err != nil {
s.failBuild(ctx, buildID, fmt.Sprintf("base template %q not found: %v", build.BaseTemplate, err))
return nil, "", nil, err
}
baseTeamID = baseTmpl.TeamID
baseTemplateID = baseTmpl.ID
}
resp, err := agent.CreateSandbox(ctx, connect.NewRequest(&pb.CreateSandboxRequest{
SandboxId: sandboxIDStr,
Template: build.BaseTemplate,
TeamId: id.UUIDString(baseTeamID),
TemplateId: id.UUIDString(baseTemplateID),
Vcpus: build.Vcpus,
MemoryMb: build.MemoryMb,
TimeoutSec: 0,
DiskSizeMb: 5120,
}))
if err != nil {
s.failBuild(ctx, buildID, fmt.Sprintf("create sandbox failed: %v", err))
return nil, "", nil, err
}
sandboxMetadata := resp.Msg.Metadata
_ = s.DB.UpdateBuildSandbox(ctx, db.UpdateBuildSandboxParams{
ID: buildID,
SandboxID: sandboxID,
HostID: host.ID,
})
archive := s.takeArchive(buildIDStr)
if len(archive) > 0 {
if err := s.uploadAndExtractArchive(ctx, agent, sandboxIDStr, archive, buildIDStr); err != nil {
s.destroySandbox(ctx, agent, sandboxIDStr)
s.failBuild(ctx, buildID, fmt.Sprintf("archive upload failed: %v", err))
return nil, "", nil, err
}
}
return agent, sandboxIDStr, sandboxMetadata, nil
}
// finalizeBuild handles the healthcheck/snapshot/flatten step and persists the
// template record. Called after all recipe phases complete successfully.
func (s *BuildService) finalizeBuild(
ctx context.Context,
buildID pgtype.UUID,
build db.TemplateBuild,
agent buildAgentClient,
sandboxIDStr string,
defaultUser string,
defaultEnv map[string]string,
sandboxMetadata map[string]string,
log *slog.Logger,
) {
var sizeBytes int64 var sizeBytes int64
if build.Healthcheck != "" { if build.Healthcheck != "" {
hc, err := recipe.ParseHealthcheck(build.Healthcheck) hc, err := recipe.ParseHealthcheck(build.Healthcheck)
if err != nil { if err != nil {
s.destroySandbox(buildCtx, agent, sandboxIDStr) s.destroySandbox(ctx, agent, sandboxIDStr)
s.failBuild(buildCtx, buildID, fmt.Sprintf("invalid healthcheck: %v", err)) s.failBuild(ctx, buildID, fmt.Sprintf("invalid healthcheck: %v", err))
return return
} }
log.Info("running healthcheck", "cmd", hc.Cmd, "interval", hc.Interval, "timeout", hc.Timeout, "start_period", hc.StartPeriod, "retries", hc.Retries) log.Info("running healthcheck", "cmd", hc.Cmd, "interval", hc.Interval, "timeout", hc.Timeout, "start_period", hc.StartPeriod, "retries", hc.Retries)
if err := s.waitForHealthcheck(buildCtx, agent, sandboxIDStr, hc, templateDefaultUser); err != nil { if err := s.waitForHealthcheck(ctx, agent, sandboxIDStr, hc, defaultUser); err != nil {
s.destroySandbox(buildCtx, agent, sandboxIDStr) s.destroySandbox(ctx, agent, sandboxIDStr)
if buildCtx.Err() != nil { if ctx.Err() != nil {
return return
} }
s.failBuild(buildCtx, buildID, fmt.Sprintf("healthcheck failed: %v", err)) s.failBuild(ctx, buildID, fmt.Sprintf("healthcheck failed: %v", err))
return return
} }
// Healthcheck passed → full snapshot (with memory/CPU state).
log.Info("healthcheck passed, creating snapshot") log.Info("healthcheck passed, creating snapshot")
snapResp, err := agent.CreateSnapshot(buildCtx, connect.NewRequest(&pb.CreateSnapshotRequest{ snapResp, err := agent.CreateSnapshot(ctx, connect.NewRequest(&pb.CreateSnapshotRequest{
SandboxId: sandboxIDStr, SandboxId: sandboxIDStr,
Name: build.Name, Name: build.Name,
TeamId: id.UUIDString(build.TeamID), TeamId: id.UUIDString(build.TeamID),
TemplateId: id.UUIDString(build.TemplateID), TemplateId: id.UUIDString(build.TemplateID),
})) }))
if err != nil { if err != nil {
s.destroySandbox(buildCtx, agent, sandboxIDStr) s.destroySandbox(ctx, agent, sandboxIDStr)
if buildCtx.Err() != nil { if ctx.Err() != nil {
return return
} }
s.failBuild(buildCtx, buildID, fmt.Sprintf("create snapshot failed: %v", err)) s.failBuild(ctx, buildID, fmt.Sprintf("create snapshot failed: %v", err))
return return
} }
sizeBytes = snapResp.Msg.SizeBytes sizeBytes = snapResp.Msg.SizeBytes
} else { } else {
// No healthcheck → image-only template (rootfs only).
log.Info("no healthcheck, flattening rootfs") log.Info("no healthcheck, flattening rootfs")
flatResp, err := agent.FlattenRootfs(buildCtx, connect.NewRequest(&pb.FlattenRootfsRequest{ flatResp, err := agent.FlattenRootfs(ctx, connect.NewRequest(&pb.FlattenRootfsRequest{
SandboxId: sandboxIDStr, SandboxId: sandboxIDStr,
Name: build.Name, Name: build.Name,
TeamId: id.UUIDString(build.TeamID), TeamId: id.UUIDString(build.TeamID),
TemplateId: id.UUIDString(build.TemplateID), TemplateId: id.UUIDString(build.TemplateID),
})) }))
if err != nil { if err != nil {
s.destroySandbox(buildCtx, agent, sandboxIDStr) s.destroySandbox(ctx, agent, sandboxIDStr)
if buildCtx.Err() != nil { if ctx.Err() != nil {
return return
} }
s.failBuild(buildCtx, buildID, fmt.Sprintf("flatten rootfs failed: %v", err)) s.failBuild(ctx, buildID, fmt.Sprintf("flatten rootfs failed: %v", err))
return return
} }
sizeBytes = flatResp.Msg.SizeBytes sizeBytes = flatResp.Msg.SizeBytes
} }
// Insert into templates table as a global (platform) template.
templateType := "base" templateType := "base"
if build.Healthcheck != "" { if build.Healthcheck != "" {
templateType = "snapshot" templateType = "snapshot"
} }
// Serialize env vars for DB storage. defaultEnvJSON, err := json.Marshal(defaultEnv)
defaultEnvJSON, err := json.Marshal(templateDefaultEnv)
if err != nil { if err != nil {
defaultEnvJSON = []byte("{}") defaultEnvJSON = []byte("{}")
} }
// Serialize sandbox metadata for DB storage.
metadataJSON, err := json.Marshal(sandboxMetadata) metadataJSON, err := json.Marshal(sandboxMetadata)
if err != nil || len(sandboxMetadata) == 0 { if err != nil || len(sandboxMetadata) == 0 {
metadataJSON = []byte("{}") metadataJSON = []byte("{}")
} }
if _, err := s.DB.InsertTemplate(buildCtx, db.InsertTemplateParams{ if _, err := s.DB.InsertTemplate(ctx, db.InsertTemplateParams{
ID: build.TemplateID, ID: build.TemplateID,
Name: build.Name, Name: build.Name,
Type: templateType, Type: templateType,
@ -517,28 +540,21 @@ func (s *BuildService) executeBuild(ctx context.Context, buildIDStr string) {
MemoryMb: build.MemoryMb, MemoryMb: build.MemoryMb,
SizeBytes: sizeBytes, SizeBytes: sizeBytes,
TeamID: id.PlatformTeamID, TeamID: id.PlatformTeamID,
DefaultUser: templateDefaultUser, DefaultUser: defaultUser,
DefaultEnv: defaultEnvJSON, DefaultEnv: defaultEnvJSON,
Metadata: metadataJSON, Metadata: metadataJSON,
}); err != nil { }); err != nil {
log.Error("failed to insert template record", "error", err) log.Error("failed to insert template record", "error", err)
// Build succeeded on disk, just DB record failed — don't mark as failed.
} }
// Record defaults and metadata on the build record for inspection. _ = s.DB.UpdateBuildDefaults(ctx, db.UpdateBuildDefaultsParams{
_ = s.DB.UpdateBuildDefaults(buildCtx, db.UpdateBuildDefaultsParams{
ID: buildID, ID: buildID,
DefaultUser: templateDefaultUser, DefaultUser: defaultUser,
DefaultEnv: defaultEnvJSON, DefaultEnv: defaultEnvJSON,
Metadata: metadataJSON, Metadata: metadataJSON,
}) })
// For CreateSnapshot, the sandbox is already destroyed by the snapshot process. if _, err := s.DB.UpdateBuildStatus(ctx, db.UpdateBuildStatusParams{
// For FlattenRootfs, the sandbox is already destroyed by the flatten process.
// No additional destroy needed.
// Mark build as success.
if _, err := s.DB.UpdateBuildStatus(buildCtx, db.UpdateBuildStatusParams{
ID: buildID, Status: "success", ID: buildID, Status: "success",
}); err != nil { }); err != nil {
log.Error("failed to mark build as success", "error", err) log.Error("failed to mark build as success", "error", err)
@ -768,7 +784,7 @@ var runtimeEnvVars = map[string]bool{
"HOME": true, "USER": true, "LOGNAME": true, "SHELL": true, "HOME": true, "USER": true, "LOGNAME": true, "SHELL": true,
"PWD": true, "OLDPWD": true, "HOSTNAME": true, "TERM": true, "PWD": true, "OLDPWD": true, "HOSTNAME": true, "TERM": true,
"SHLVL": true, "_": true, "SHLVL": true, "_": true,
// Per-sandbox identifiers set by envd at boot via MMDS. // Per-sandbox identifiers set by envd at boot via PostInit.
"WRENN_SANDBOX_ID": true, "WRENN_TEMPLATE_ID": true, "WRENN_SANDBOX_ID": true, "WRENN_TEMPLATE_ID": true,
} }

View File

@ -94,6 +94,31 @@ type regTokenPayload struct {
const regTokenTTL = time.Hour const regTokenTTL = time.Hour
func (s *HostService) issueRegistrationToken(ctx context.Context, hostID, createdBy pgtype.UUID) (string, error) {
token := id.NewRegistrationToken()
tokenID := id.NewHostTokenID()
payload, _ := json.Marshal(regTokenPayload{
HostID: id.FormatHostID(hostID),
TokenID: id.FormatHostTokenID(tokenID),
})
if err := s.Redis.Set(ctx, "host:reg:"+token, payload, regTokenTTL).Err(); err != nil {
return "", fmt.Errorf("store registration token: %w", err)
}
now := time.Now()
if _, err := s.DB.InsertHostToken(ctx, db.InsertHostTokenParams{
ID: tokenID,
HostID: hostID,
CreatedBy: createdBy,
ExpiresAt: pgtype.Timestamptz{Time: now.Add(regTokenTTL), Valid: true},
}); err != nil {
slog.Warn("failed to insert host token audit record", "host_id", id.FormatHostID(hostID), "error", err)
}
return token, nil
}
// requireAdminOrOwner returns nil iff the role is "owner" or "admin". // requireAdminOrOwner returns nil iff the role is "owner" or "admin".
func requireAdminOrOwner(role string) error { func requireAdminOrOwner(role string) error {
if role == "owner" || role == "admin" { if role == "owner" || role == "admin" {
@ -159,26 +184,9 @@ func (s *HostService) Create(ctx context.Context, p HostCreateParams) (HostCreat
return HostCreateResult{}, fmt.Errorf("insert host: %w", err) return HostCreateResult{}, fmt.Errorf("insert host: %w", err)
} }
// Generate registration token and store in Redis + Postgres audit trail. token, err := s.issueRegistrationToken(ctx, hostID, p.RequestingUserID)
token := id.NewRegistrationToken() if err != nil {
tokenID := id.NewHostTokenID() return HostCreateResult{}, err
payload, _ := json.Marshal(regTokenPayload{
HostID: id.FormatHostID(hostID),
TokenID: id.FormatHostTokenID(tokenID),
})
if err := s.Redis.Set(ctx, "host:reg:"+token, payload, regTokenTTL).Err(); err != nil {
return HostCreateResult{}, fmt.Errorf("store registration token: %w", err)
}
now := time.Now()
if _, err := s.DB.InsertHostToken(ctx, db.InsertHostTokenParams{
ID: tokenID,
HostID: hostID,
CreatedBy: p.RequestingUserID,
ExpiresAt: pgtype.Timestamptz{Time: now.Add(regTokenTTL), Valid: true},
}); err != nil {
slog.Warn("failed to insert host token audit record", "host_id", id.FormatHostID(hostID), "error", err)
} }
return HostCreateResult{Host: host, RegistrationToken: token}, nil return HostCreateResult{Host: host, RegistrationToken: token}, nil
@ -218,25 +226,9 @@ func (s *HostService) RegenerateToken(ctx context.Context, hostID, userID, teamI
} }
} }
token := id.NewRegistrationToken() token, err := s.issueRegistrationToken(ctx, hostID, userID)
tokenID := id.NewHostTokenID() if err != nil {
return HostCreateResult{}, err
payload, _ := json.Marshal(regTokenPayload{
HostID: id.FormatHostID(hostID),
TokenID: id.FormatHostTokenID(tokenID),
})
if err := s.Redis.Set(ctx, "host:reg:"+token, payload, regTokenTTL).Err(); err != nil {
return HostCreateResult{}, fmt.Errorf("store registration token: %w", err)
}
now := time.Now()
if _, err := s.DB.InsertHostToken(ctx, db.InsertHostTokenParams{
ID: tokenID,
HostID: hostID,
CreatedBy: userID,
ExpiresAt: pgtype.Timestamptz{Time: now.Add(regTokenTTL), Valid: true},
}); err != nil {
slog.Warn("failed to insert host token audit record", "host_id", id.FormatHostID(hostID), "error", err)
} }
return HostCreateResult{Host: host, RegistrationToken: token}, nil return HostCreateResult{Host: host, RegistrationToken: token}, nil

View File

@ -18,12 +18,27 @@ import (
pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen" pb "git.omukk.dev/wrenn/wrenn/proto/hostagent/gen"
) )
// SandboxEventPublisher writes sandbox lifecycle events to the Redis stream.
type SandboxEventPublisher func(ctx context.Context, event SandboxStateEvent)
// SandboxStateEvent is the event payload published to the Redis stream.
type SandboxStateEvent struct {
Event string `json:"event"`
SandboxID string `json:"sandbox_id"`
HostID string `json:"host_id"`
HostIP string `json:"host_ip,omitempty"`
Metadata map[string]string `json:"metadata,omitempty"`
Error string `json:"error,omitempty"`
Timestamp int64 `json:"timestamp"`
}
// SandboxService provides sandbox lifecycle operations shared between the // SandboxService provides sandbox lifecycle operations shared between the
// REST API and the dashboard. // REST API and the dashboard.
type SandboxService struct { type SandboxService struct {
DB *db.Queries DB *db.Queries
Pool *lifecycle.HostClientPool Pool *lifecycle.HostClientPool
Scheduler scheduler.HostScheduler Scheduler scheduler.HostScheduler
PublishEvent SandboxEventPublisher
} }
// SandboxCreateParams holds the parameters for creating a sandbox. // SandboxCreateParams holds the parameters for creating a sandbox.
@ -53,6 +68,12 @@ func (s *SandboxService) agentForSandbox(ctx context.Context, sandboxID pgtype.U
return agent, sb, nil return agent, sb, nil
} }
func (s *SandboxService) publishEvent(ctx context.Context, event SandboxStateEvent) {
if s.PublishEvent != nil {
s.PublishEvent(ctx, event)
}
}
// hostagentClient is a local alias to avoid the full package path in signatures. // hostagentClient is a local alias to avoid the full package path in signatures.
type hostagentClient = interface { type hostagentClient = interface {
CreateSandbox(ctx context.Context, req *connect.Request[pb.CreateSandboxRequest]) (*connect.Response[pb.CreateSandboxResponse], error) CreateSandbox(ctx context.Context, req *connect.Request[pb.CreateSandboxRequest]) (*connect.Response[pb.CreateSandboxResponse], error)
@ -64,8 +85,10 @@ type hostagentClient = interface {
FlushSandboxMetrics(ctx context.Context, req *connect.Request[pb.FlushSandboxMetricsRequest]) (*connect.Response[pb.FlushSandboxMetricsResponse], error) FlushSandboxMetrics(ctx context.Context, req *connect.Request[pb.FlushSandboxMetricsRequest]) (*connect.Response[pb.FlushSandboxMetricsResponse], error)
} }
// Create creates a new sandbox: picks a host via the scheduler, inserts a pending // Create creates a new sandbox asynchronously: picks a host, inserts a
// DB record, calls the host agent, and updates the record to running. // "starting" DB record, fires the agent RPC in a background goroutine, and
// returns the sandbox immediately. The background goroutine publishes a
// sandbox event to the Redis stream when the operation completes.
func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.Sandbox, error) { func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.Sandbox, error) {
if p.Template == "" { if p.Template == "" {
p.Template = "minimal" p.Template = "minimal"
@ -96,11 +119,9 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
templateTeamID = tmpl.TeamID templateTeamID = tmpl.TeamID
templateID = tmpl.ID templateID = tmpl.ID
templateDefaultUser = tmpl.DefaultUser templateDefaultUser = tmpl.DefaultUser
// Parse default_env JSONB into a map.
if len(tmpl.DefaultEnv) > 0 { if len(tmpl.DefaultEnv) > 0 {
_ = json.Unmarshal(tmpl.DefaultEnv, &templateDefaultEnv) _ = json.Unmarshal(tmpl.DefaultEnv, &templateDefaultEnv)
} }
// If the template is a snapshot, use its baked-in vcpus/memory.
if tmpl.Type == "snapshot" { if tmpl.Type == "snapshot" {
p.VCPUs = tmpl.Vcpus p.VCPUs = tmpl.Vcpus
p.MemoryMB = tmpl.MemoryMb p.MemoryMB = tmpl.MemoryMb
@ -111,13 +132,11 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
return db.Sandbox{}, fmt.Errorf("invalid request: team_id is required") return db.Sandbox{}, fmt.Errorf("invalid request: team_id is required")
} }
// Determine whether this team uses BYOC hosts or platform hosts.
team, err := s.DB.GetTeam(ctx, p.TeamID) team, err := s.DB.GetTeam(ctx, p.TeamID)
if err != nil { if err != nil {
return db.Sandbox{}, fmt.Errorf("team not found: %w", err) return db.Sandbox{}, fmt.Errorf("team not found: %w", err)
} }
// Pick a host for this sandbox.
host, err := s.Scheduler.SelectHost(ctx, p.TeamID, team.IsByoc, p.MemoryMB, p.DiskSizeMB) host, err := s.Scheduler.SelectHost(ctx, p.TeamID, team.IsByoc, p.MemoryMB, p.DiskSizeMB)
if err != nil { if err != nil {
return db.Sandbox{}, fmt.Errorf("select host: %w", err) return db.Sandbox{}, fmt.Errorf("select host: %w", err)
@ -130,13 +149,14 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
sandboxID := id.NewSandboxID() sandboxID := id.NewSandboxID()
sandboxIDStr := id.FormatSandboxID(sandboxID) sandboxIDStr := id.FormatSandboxID(sandboxID)
hostIDStr := id.FormatHostID(host.ID)
if _, err := s.DB.InsertSandbox(ctx, db.InsertSandboxParams{ sb, err := s.DB.InsertSandbox(ctx, db.InsertSandboxParams{
ID: sandboxID, ID: sandboxID,
TeamID: p.TeamID, TeamID: p.TeamID,
HostID: host.ID, HostID: host.ID,
Template: p.Template, Template: p.Template,
Status: "pending", Status: "starting",
Vcpus: p.VCPUs, Vcpus: p.VCPUs,
MemoryMb: p.MemoryMB, MemoryMb: p.MemoryMB,
TimeoutSec: p.TimeoutSec, TimeoutSec: p.TimeoutSec,
@ -144,11 +164,26 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
TemplateID: templateID, TemplateID: templateID,
TemplateTeamID: templateTeamID, TemplateTeamID: templateTeamID,
Metadata: []byte("{}"), Metadata: []byte("{}"),
}); err != nil { })
if err != nil {
return db.Sandbox{}, fmt.Errorf("insert sandbox: %w", err) return db.Sandbox{}, fmt.Errorf("insert sandbox: %w", err)
} }
resp, err := agent.CreateSandbox(ctx, connect.NewRequest(&pb.CreateSandboxRequest{ go s.createInBackground(sandboxID, sandboxIDStr, hostIDStr, agent, p, templateTeamID, templateID, templateDefaultUser, templateDefaultEnv)
return sb, nil
}
func (s *SandboxService) createInBackground(
sandboxID pgtype.UUID, sandboxIDStr, hostIDStr string,
agent hostagentClient, p SandboxCreateParams,
templateTeamID, templateID pgtype.UUID,
defaultUser string, defaultEnv map[string]string,
) {
bgCtx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()
resp, err := agent.CreateSandbox(bgCtx, connect.NewRequest(&pb.CreateSandboxRequest{
SandboxId: sandboxIDStr, SandboxId: sandboxIDStr,
Template: p.Template, Template: p.Template,
TeamId: id.UUIDString(templateTeamID), TeamId: id.UUIDString(templateTeamID),
@ -157,45 +192,52 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
MemoryMb: p.MemoryMB, MemoryMb: p.MemoryMB,
TimeoutSec: p.TimeoutSec, TimeoutSec: p.TimeoutSec,
DiskSizeMb: p.DiskSizeMB, DiskSizeMb: p.DiskSizeMB,
DefaultUser: templateDefaultUser, DefaultUser: defaultUser,
DefaultEnv: templateDefaultEnv, DefaultEnv: defaultEnv,
})) }))
if err != nil { if err != nil {
if _, dbErr := s.DB.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{ slog.Warn("background create failed", "sandbox_id", sandboxIDStr, "error", err)
ID: sandboxID, Status: "error", errCtx, errCancel := context.WithTimeout(context.Background(), 10*time.Second)
defer errCancel()
if _, dbErr := s.DB.UpdateSandboxStatusIf(errCtx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: "starting", Status_2: "error",
}); dbErr != nil { }); dbErr != nil {
slog.Warn("failed to update sandbox status to error", "id", sandboxIDStr, "error", dbErr) slog.Warn("failed to update sandbox to error after create failure", "id", sandboxIDStr, "error", dbErr)
} }
return db.Sandbox{}, fmt.Errorf("agent create: %w", err) s.publishEvent(errCtx, SandboxStateEvent{
Event: "sandbox.failed", SandboxID: sandboxIDStr, HostID: hostIDStr,
Error: err.Error(), Timestamp: time.Now().Unix(),
})
return
} }
now := time.Now() now := time.Now()
sb, err := s.DB.UpdateSandboxRunning(ctx, db.UpdateSandboxRunningParams{ if _, dbErr := s.DB.UpdateSandboxRunningIf(bgCtx, db.UpdateSandboxRunningIfParams{
ID: sandboxID, ID: sandboxID,
HostIp: resp.Msg.HostIp, Status: "starting",
GuestIp: "", HostIp: resp.Msg.HostIp,
StartedAt: pgtype.Timestamptz{ StartedAt: pgtype.Timestamptz{
Time: now, Time: now,
Valid: true, Valid: true,
}, },
}) }); dbErr != nil {
if err != nil { slog.Warn("failed to update sandbox running after create", "id", sandboxIDStr, "error", dbErr)
return db.Sandbox{}, fmt.Errorf("update sandbox running: %w", err)
} }
// Store runtime metadata from the agent (envd/kernel/firecracker/agent versions).
if meta := resp.Msg.Metadata; len(meta) > 0 { if meta := resp.Msg.Metadata; len(meta) > 0 {
metaJSON, _ := json.Marshal(meta) metaJSON, _ := json.Marshal(meta)
if err := s.DB.UpdateSandboxMetadata(ctx, db.UpdateSandboxMetadataParams{ if err := s.DB.UpdateSandboxMetadata(bgCtx, db.UpdateSandboxMetadataParams{
ID: sandboxID, ID: sandboxID, Metadata: metaJSON,
Metadata: metaJSON,
}); err != nil { }); err != nil {
slog.Warn("failed to store sandbox metadata", "id", sandboxIDStr, "error", err) slog.Warn("failed to store sandbox metadata", "id", sandboxIDStr, "error", err)
} }
sb.Metadata = metaJSON
} }
return sb, nil s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.started", SandboxID: sandboxIDStr, HostID: hostIDStr,
HostIP: resp.Msg.HostIp, Metadata: resp.Msg.Metadata,
Timestamp: now.Unix(),
})
} }
// List returns active sandboxes (excludes stopped/error) belonging to the given team. // List returns active sandboxes (excludes stopped/error) belonging to the given team.
@ -208,7 +250,9 @@ func (s *SandboxService) Get(ctx context.Context, sandboxID, teamID pgtype.UUID)
return s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID}) return s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
} }
// Pause snapshots and freezes a running sandbox to disk. // Pause snapshots and freezes a running sandbox to disk asynchronously.
// Pre-marks the DB status as "pausing" and fires the agent RPC in a
// background goroutine.
func (s *SandboxService) Pause(ctx context.Context, sandboxID, teamID pgtype.UUID) (db.Sandbox, error) { func (s *SandboxService) Pause(ctx context.Context, sandboxID, teamID pgtype.UUID) (db.Sandbox, error) {
sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID}) sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
if err != nil { if err != nil {
@ -224,25 +268,29 @@ func (s *SandboxService) Pause(ctx context.Context, sandboxID, teamID pgtype.UUI
} }
sandboxIDStr := id.FormatSandboxID(sandboxID) sandboxIDStr := id.FormatSandboxID(sandboxID)
hostIDStr := id.FormatHostID(sb.HostID)
// Pre-mark as "paused" in DB before the RPC so the reconciler does not sb, err = s.DB.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
// mark the sandbox "stopped" while the host agent processes the pause. ID: sandboxID, Status: "running", Status_2: "pausing",
if _, err := s.DB.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{ })
ID: sandboxID, Status: "paused", if err != nil {
}); err != nil { return db.Sandbox{}, fmt.Errorf("sandbox status changed concurrently")
return db.Sandbox{}, fmt.Errorf("pre-mark paused: %w", err)
} }
// Flush all metrics tiers before pausing so data survives in DB. go s.pauseInBackground(sandboxID, sandboxIDStr, hostIDStr, agent)
s.flushAndPersistMetrics(ctx, agent, sandboxID, true)
if _, err := agent.PauseSandbox(ctx, connect.NewRequest(&pb.PauseSandboxRequest{ return sb, nil
}
func (s *SandboxService) pauseInBackground(sandboxID pgtype.UUID, sandboxIDStr, hostIDStr string, agent hostagentClient) {
bgCtx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()
s.flushAndPersistMetrics(bgCtx, agent, sandboxID, true)
if _, err := agent.PauseSandbox(bgCtx, connect.NewRequest(&pb.PauseSandboxRequest{
SandboxId: sandboxIDStr, SandboxId: sandboxIDStr,
})); err != nil { })); err != nil {
// Check if the agent still has this sandbox. If it was destroyed
// (e.g. frozen VM couldn't be resumed), mark as "error" instead of
// reverting to "running" — which would create a ghost record.
// Use a fresh context since the original ctx may already be expired.
revertStatus := "running" revertStatus := "running"
pingCtx, pingCancel := context.WithTimeout(context.Background(), 10*time.Second) pingCtx, pingCancel := context.WithTimeout(context.Background(), 10*time.Second)
if _, pingErr := agent.PingSandbox(pingCtx, connect.NewRequest(&pb.PingSandboxRequest{ if _, pingErr := agent.PingSandbox(pingCtx, connect.NewRequest(&pb.PingSandboxRequest{
@ -253,23 +301,37 @@ func (s *SandboxService) Pause(ctx context.Context, sandboxID, teamID pgtype.UUI
} }
pingCancel() pingCancel()
dbCtx, dbCancel := context.WithTimeout(context.Background(), 5*time.Second) dbCtx, dbCancel := context.WithTimeout(context.Background(), 5*time.Second)
if _, dbErr := s.DB.UpdateSandboxStatus(dbCtx, db.UpdateSandboxStatusParams{ if _, dbErr := s.DB.UpdateSandboxStatusIf(dbCtx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: revertStatus, ID: sandboxID, Status: "pausing", Status_2: revertStatus,
}); dbErr != nil { }); dbErr != nil {
slog.Warn("failed to revert sandbox status after pause error", "sandbox_id", sandboxIDStr, "error", dbErr) slog.Warn("failed to revert sandbox status after pause error", "sandbox_id", sandboxIDStr, "error", dbErr)
} }
dbCancel() dbCancel()
return db.Sandbox{}, fmt.Errorf("agent pause: %w", err)
evtCtx, evtCancel := context.WithTimeout(context.Background(), 5*time.Second)
s.publishEvent(evtCtx, SandboxStateEvent{
Event: "sandbox.failed", SandboxID: sandboxIDStr, HostID: hostIDStr,
Error: err.Error(), Timestamp: time.Now().Unix(),
})
evtCancel()
return
} }
sb, err = s.DB.GetSandbox(ctx, sandboxID) if _, err := s.DB.UpdateSandboxStatusIf(bgCtx, db.UpdateSandboxStatusIfParams{
if err != nil { ID: sandboxID, Status: "pausing", Status_2: "paused",
return db.Sandbox{}, fmt.Errorf("get sandbox after pause: %w", err) }); err != nil {
slog.Warn("failed to update sandbox to paused", "sandbox_id", sandboxIDStr, "error", err)
} }
return sb, nil
s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.paused", SandboxID: sandboxIDStr, HostID: hostIDStr,
Timestamp: time.Now().Unix(),
})
} }
// Resume restores a paused sandbox from snapshot. // Resume restores a paused sandbox from snapshot asynchronously.
// Pre-marks the DB status as "resuming" and fires the agent RPC in a
// background goroutine.
func (s *SandboxService) Resume(ctx context.Context, sandboxID, teamID pgtype.UUID) (db.Sandbox, error) { func (s *SandboxService) Resume(ctx context.Context, sandboxID, teamID pgtype.UUID) (db.Sandbox, error) {
sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID}) sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
if err != nil { if err != nil {
@ -285,8 +347,8 @@ func (s *SandboxService) Resume(ctx context.Context, sandboxID, teamID pgtype.UU
} }
sandboxIDStr := id.FormatSandboxID(sandboxID) sandboxIDStr := id.FormatSandboxID(sandboxID)
hostIDStr := id.FormatHostID(sb.HostID)
// Look up template defaults for resume.
var resumeDefaultUser string var resumeDefaultUser string
var resumeDefaultEnv map[string]string var resumeDefaultEnv map[string]string
if sb.TemplateID.Valid { if sb.TemplateID.Valid {
@ -299,7 +361,6 @@ func (s *SandboxService) Resume(ctx context.Context, sandboxID, teamID pgtype.UU
} }
} }
// Extract kernel version hint from existing sandbox metadata.
var kernelVersion string var kernelVersion string
if len(sb.Metadata) > 0 { if len(sb.Metadata) > 0 {
var meta map[string]string var meta map[string]string
@ -308,52 +369,88 @@ func (s *SandboxService) Resume(ctx context.Context, sandboxID, teamID pgtype.UU
} }
} }
resp, err := agent.ResumeSandbox(ctx, connect.NewRequest(&pb.ResumeSandboxRequest{ sb, err = s.DB.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
SandboxId: sandboxIDStr, ID: sandboxID, Status: "paused", Status_2: "resuming",
TimeoutSec: sb.TimeoutSec,
DefaultUser: resumeDefaultUser,
DefaultEnv: resumeDefaultEnv,
KernelVersion: kernelVersion,
}))
if err != nil {
return db.Sandbox{}, fmt.Errorf("agent resume: %w", err)
}
now := time.Now()
sb, err = s.DB.UpdateSandboxRunning(ctx, db.UpdateSandboxRunningParams{
ID: sandboxID,
HostIp: resp.Msg.HostIp,
GuestIp: "",
StartedAt: pgtype.Timestamptz{
Time: now,
Valid: true,
},
}) })
if err != nil { if err != nil {
return db.Sandbox{}, fmt.Errorf("update status: %w", err) return db.Sandbox{}, fmt.Errorf("sandbox status changed concurrently")
} }
// Update metadata with actual versions used after resume. go s.resumeInBackground(sandboxID, sandboxIDStr, hostIDStr, agent, sb.TimeoutSec, resumeDefaultUser, resumeDefaultEnv, kernelVersion)
if meta := resp.Msg.Metadata; len(meta) > 0 {
metaJSON, _ := json.Marshal(meta)
if err := s.DB.UpdateSandboxMetadata(ctx, db.UpdateSandboxMetadataParams{
ID: sandboxID,
Metadata: metaJSON,
}); err != nil {
slog.Warn("failed to update sandbox metadata after resume", "id", sandboxIDStr, "error", err)
}
sb.Metadata = metaJSON
}
return sb, nil return sb, nil
} }
// Destroy stops a sandbox and marks it as stopped. func (s *SandboxService) resumeInBackground(
sandboxID pgtype.UUID, sandboxIDStr, hostIDStr string,
agent hostagentClient, timeoutSec int32,
defaultUser string, defaultEnv map[string]string, kernelVersion string,
) {
bgCtx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
resp, err := agent.ResumeSandbox(bgCtx, connect.NewRequest(&pb.ResumeSandboxRequest{
SandboxId: sandboxIDStr,
TimeoutSec: timeoutSec,
DefaultUser: defaultUser,
DefaultEnv: defaultEnv,
KernelVersion: kernelVersion,
}))
if err != nil {
slog.Warn("background resume failed", "sandbox_id", sandboxIDStr, "error", err)
errCtx, errCancel := context.WithTimeout(context.Background(), 10*time.Second)
defer errCancel()
if _, dbErr := s.DB.UpdateSandboxStatusIf(errCtx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: "resuming", Status_2: "paused",
}); dbErr != nil {
slog.Warn("failed to revert sandbox to paused after resume failure", "id", sandboxIDStr, "error", dbErr)
}
s.publishEvent(errCtx, SandboxStateEvent{
Event: "sandbox.failed", SandboxID: sandboxIDStr, HostID: hostIDStr,
Error: err.Error(), Timestamp: time.Now().Unix(),
})
return
}
now := time.Now()
if _, dbErr := s.DB.UpdateSandboxRunningIf(bgCtx, db.UpdateSandboxRunningIfParams{
ID: sandboxID,
Status: "resuming",
HostIp: resp.Msg.HostIp,
StartedAt: pgtype.Timestamptz{
Time: now,
Valid: true,
},
}); dbErr != nil {
slog.Warn("failed to update sandbox running after resume", "id", sandboxIDStr, "error", dbErr)
}
if meta := resp.Msg.Metadata; len(meta) > 0 {
metaJSON, _ := json.Marshal(meta)
if err := s.DB.UpdateSandboxMetadata(bgCtx, db.UpdateSandboxMetadataParams{
ID: sandboxID, Metadata: metaJSON,
}); err != nil {
slog.Warn("failed to update sandbox metadata after resume", "id", sandboxIDStr, "error", err)
}
}
s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.resumed", SandboxID: sandboxIDStr, HostID: hostIDStr,
HostIP: resp.Msg.HostIp, Metadata: resp.Msg.Metadata,
Timestamp: now.Unix(),
})
}
// Destroy stops a sandbox asynchronously. Pre-marks the DB status as
// "stopping" and fires the agent RPC in a background goroutine.
func (s *SandboxService) Destroy(ctx context.Context, sandboxID, teamID pgtype.UUID) error { func (s *SandboxService) Destroy(ctx context.Context, sandboxID, teamID pgtype.UUID) error {
sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID}) sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
if err != nil { if err != nil {
return fmt.Errorf("sandbox not found: %w", err) return fmt.Errorf("sandbox not found: %w", err)
} }
if sb.Status == "stopped" || sb.Status == "error" {
return nil
}
agent, _, err := s.agentForSandbox(ctx, sandboxID) agent, _, err := s.agentForSandbox(ctx, sandboxID)
if err != nil { if err != nil {
@ -361,35 +458,53 @@ func (s *SandboxService) Destroy(ctx context.Context, sandboxID, teamID pgtype.U
} }
sandboxIDStr := id.FormatSandboxID(sandboxID) sandboxIDStr := id.FormatSandboxID(sandboxID)
hostIDStr := id.FormatHostID(sb.HostID)
prevStatus := sb.Status
// If running, flush 24h tier metrics for analytics before destroying. if _, err := s.DB.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{
if sb.Status == "running" { ID: sandboxID, Status: "stopping",
s.flushAndPersistMetrics(ctx, agent, sandboxID, false) }); err != nil {
return fmt.Errorf("pre-mark stopping: %w", err)
} }
// Destroy on host agent. A not-found response is fine — sandbox is already gone. go s.destroyInBackground(sandboxID, sandboxIDStr, hostIDStr, agent, prevStatus)
if _, err := agent.DestroySandbox(ctx, connect.NewRequest(&pb.DestroySandboxRequest{
return nil
}
func (s *SandboxService) destroyInBackground(sandboxID pgtype.UUID, sandboxIDStr, hostIDStr string, agent hostagentClient, prevStatus string) {
bgCtx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
if prevStatus == "running" || prevStatus == "pausing" {
s.flushAndPersistMetrics(bgCtx, agent, sandboxID, false)
}
if _, err := agent.DestroySandbox(bgCtx, connect.NewRequest(&pb.DestroySandboxRequest{
SandboxId: sandboxIDStr, SandboxId: sandboxIDStr,
})); err != nil && connect.CodeOf(err) != connect.CodeNotFound { })); err != nil && connect.CodeOf(err) != connect.CodeNotFound {
return fmt.Errorf("agent destroy: %w", err) slog.Warn("background destroy failed", "sandbox_id", sandboxIDStr, "error", err)
} }
// For a paused sandbox, only keep 24h tier; remove the finer-grained tiers. if prevStatus == "paused" {
if sb.Status == "paused" { _ = s.DB.DeleteSandboxMetricPointsByTier(bgCtx, db.DeleteSandboxMetricPointsByTierParams{
_ = s.DB.DeleteSandboxMetricPointsByTier(ctx, db.DeleteSandboxMetricPointsByTierParams{
SandboxID: sandboxID, Tier: "10m", SandboxID: sandboxID, Tier: "10m",
}) })
_ = s.DB.DeleteSandboxMetricPointsByTier(ctx, db.DeleteSandboxMetricPointsByTierParams{ _ = s.DB.DeleteSandboxMetricPointsByTier(bgCtx, db.DeleteSandboxMetricPointsByTierParams{
SandboxID: sandboxID, Tier: "2h", SandboxID: sandboxID, Tier: "2h",
}) })
} }
if _, err := s.DB.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{ if _, err := s.DB.UpdateSandboxStatusIf(bgCtx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: "stopped", ID: sandboxID, Status: "stopping", Status_2: "stopped",
}); err != nil { }); err != nil {
return fmt.Errorf("update status: %w", err) slog.Warn("failed to update sandbox to stopped", "sandbox_id", sandboxIDStr, "error", err)
} }
return nil
s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.stopped", SandboxID: sandboxIDStr, HostID: hostIDStr,
Timestamp: time.Now().Unix(),
})
} }
// flushAndPersistMetrics calls FlushSandboxMetrics on the agent and stores // flushAndPersistMetrics calls FlushSandboxMetrics on the agent and stores

View File

@ -155,7 +155,7 @@ type CreateSandboxResponse struct {
Status string `protobuf:"bytes,2,opt,name=status,proto3" json:"status,omitempty"` Status string `protobuf:"bytes,2,opt,name=status,proto3" json:"status,omitempty"`
HostIp string `protobuf:"bytes,3,opt,name=host_ip,json=hostIp,proto3" json:"host_ip,omitempty"` HostIp string `protobuf:"bytes,3,opt,name=host_ip,json=hostIp,proto3" json:"host_ip,omitempty"`
// Runtime metadata collected during sandbox creation (e.g. envd_version, // Runtime metadata collected during sandbox creation (e.g. envd_version,
// kernel_version, firecracker_version, agent_version). // kernel_version, vmm_version, agent_version).
Metadata map[string]string `protobuf:"bytes,4,rep,name=metadata,proto3" json:"metadata,omitempty" protobuf_key:"bytes,1,opt,name=key" protobuf_val:"bytes,2,opt,name=value"` Metadata map[string]string `protobuf:"bytes,4,rep,name=metadata,proto3" json:"metadata,omitempty" protobuf_key:"bytes,1,opt,name=key" protobuf_val:"bytes,2,opt,name=value"`
unknownFields protoimpl.UnknownFields unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache sizeCache protoimpl.SizeCache
@ -759,7 +759,11 @@ type ExecRequest struct {
Cmd string `protobuf:"bytes,2,opt,name=cmd,proto3" json:"cmd,omitempty"` Cmd string `protobuf:"bytes,2,opt,name=cmd,proto3" json:"cmd,omitempty"`
Args []string `protobuf:"bytes,3,rep,name=args,proto3" json:"args,omitempty"` Args []string `protobuf:"bytes,3,rep,name=args,proto3" json:"args,omitempty"`
// Timeout for the command in seconds (default: 30). // Timeout for the command in seconds (default: 30).
TimeoutSec int32 `protobuf:"varint,4,opt,name=timeout_sec,json=timeoutSec,proto3" json:"timeout_sec,omitempty"` TimeoutSec int32 `protobuf:"varint,4,opt,name=timeout_sec,json=timeoutSec,proto3" json:"timeout_sec,omitempty"`
// Environment variables to set for the command.
Envs map[string]string `protobuf:"bytes,5,rep,name=envs,proto3" json:"envs,omitempty" protobuf_key:"bytes,1,opt,name=key" protobuf_val:"bytes,2,opt,name=value"`
// Working directory for the command.
Cwd string `protobuf:"bytes,6,opt,name=cwd,proto3" json:"cwd,omitempty"`
unknownFields protoimpl.UnknownFields unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache sizeCache protoimpl.SizeCache
} }
@ -822,6 +826,20 @@ func (x *ExecRequest) GetTimeoutSec() int32 {
return 0 return 0
} }
func (x *ExecRequest) GetEnvs() map[string]string {
if x != nil {
return x.Envs
}
return nil
}
func (x *ExecRequest) GetCwd() string {
if x != nil {
return x.Cwd
}
return ""
}
type ExecResponse struct { type ExecResponse struct {
state protoimpl.MessageState `protogen:"open.v1"` state protoimpl.MessageState `protogen:"open.v1"`
Stdout []byte `protobuf:"bytes,1,opt,name=stdout,proto3" json:"stdout,omitempty"` Stdout []byte `protobuf:"bytes,1,opt,name=stdout,proto3" json:"stdout,omitempty"`
@ -4196,14 +4214,19 @@ const file_hostagent_proto_rawDesc = "" +
"\ateam_id\x18\x02 \x01(\tR\x06teamId\x12\x1f\n" + "\ateam_id\x18\x02 \x01(\tR\x06teamId\x12\x1f\n" +
"\vtemplate_id\x18\x03 \x01(\tR\n" + "\vtemplate_id\x18\x03 \x01(\tR\n" +
"templateId\"\x18\n" + "templateId\"\x18\n" +
"\x16DeleteSnapshotResponse\"s\n" + "\x16DeleteSnapshotResponse\"\xf7\x01\n" +
"\vExecRequest\x12\x1d\n" + "\vExecRequest\x12\x1d\n" +
"\n" + "\n" +
"sandbox_id\x18\x01 \x01(\tR\tsandboxId\x12\x10\n" + "sandbox_id\x18\x01 \x01(\tR\tsandboxId\x12\x10\n" +
"\x03cmd\x18\x02 \x01(\tR\x03cmd\x12\x12\n" + "\x03cmd\x18\x02 \x01(\tR\x03cmd\x12\x12\n" +
"\x04args\x18\x03 \x03(\tR\x04args\x12\x1f\n" + "\x04args\x18\x03 \x03(\tR\x04args\x12\x1f\n" +
"\vtimeout_sec\x18\x04 \x01(\x05R\n" + "\vtimeout_sec\x18\x04 \x01(\x05R\n" +
"timeoutSec\"[\n" + "timeoutSec\x127\n" +
"\x04envs\x18\x05 \x03(\v2#.hostagent.v1.ExecRequest.EnvsEntryR\x04envs\x12\x10\n" +
"\x03cwd\x18\x06 \x01(\tR\x03cwd\x1a7\n" +
"\tEnvsEntry\x12\x10\n" +
"\x03key\x18\x01 \x01(\tR\x03key\x12\x14\n" +
"\x05value\x18\x02 \x01(\tR\x05value:\x028\x01\"[\n" +
"\fExecResponse\x12\x16\n" + "\fExecResponse\x12\x16\n" +
"\x06stdout\x18\x01 \x01(\fR\x06stdout\x12\x16\n" + "\x06stdout\x18\x01 \x01(\fR\x06stdout\x12\x16\n" +
"\x06stderr\x18\x02 \x01(\fR\x06stderr\x12\x1b\n" + "\x06stderr\x18\x02 \x01(\fR\x06stderr\x12\x1b\n" +
@ -4486,7 +4509,7 @@ func file_hostagent_proto_rawDescGZIP() []byte {
return file_hostagent_proto_rawDescData return file_hostagent_proto_rawDescData
} }
var file_hostagent_proto_msgTypes = make([]protoimpl.MessageInfo, 76) var file_hostagent_proto_msgTypes = make([]protoimpl.MessageInfo, 77)
var file_hostagent_proto_goTypes = []any{ var file_hostagent_proto_goTypes = []any{
(*CreateSandboxRequest)(nil), // 0: hostagent.v1.CreateSandboxRequest (*CreateSandboxRequest)(nil), // 0: hostagent.v1.CreateSandboxRequest
(*CreateSandboxResponse)(nil), // 1: hostagent.v1.CreateSandboxResponse (*CreateSandboxResponse)(nil), // 1: hostagent.v1.CreateSandboxResponse
@ -4561,99 +4584,101 @@ var file_hostagent_proto_goTypes = []any{
nil, // 70: hostagent.v1.CreateSandboxResponse.MetadataEntry nil, // 70: hostagent.v1.CreateSandboxResponse.MetadataEntry
nil, // 71: hostagent.v1.ResumeSandboxRequest.DefaultEnvEntry nil, // 71: hostagent.v1.ResumeSandboxRequest.DefaultEnvEntry
nil, // 72: hostagent.v1.ResumeSandboxResponse.MetadataEntry nil, // 72: hostagent.v1.ResumeSandboxResponse.MetadataEntry
nil, // 73: hostagent.v1.SandboxInfo.MetadataEntry nil, // 73: hostagent.v1.ExecRequest.EnvsEntry
nil, // 74: hostagent.v1.PtyAttachRequest.EnvsEntry nil, // 74: hostagent.v1.SandboxInfo.MetadataEntry
nil, // 75: hostagent.v1.StartBackgroundRequest.EnvsEntry nil, // 75: hostagent.v1.PtyAttachRequest.EnvsEntry
nil, // 76: hostagent.v1.StartBackgroundRequest.EnvsEntry
} }
var file_hostagent_proto_depIdxs = []int32{ var file_hostagent_proto_depIdxs = []int32{
69, // 0: hostagent.v1.CreateSandboxRequest.default_env:type_name -> hostagent.v1.CreateSandboxRequest.DefaultEnvEntry 69, // 0: hostagent.v1.CreateSandboxRequest.default_env:type_name -> hostagent.v1.CreateSandboxRequest.DefaultEnvEntry
70, // 1: hostagent.v1.CreateSandboxResponse.metadata:type_name -> hostagent.v1.CreateSandboxResponse.MetadataEntry 70, // 1: hostagent.v1.CreateSandboxResponse.metadata:type_name -> hostagent.v1.CreateSandboxResponse.MetadataEntry
71, // 2: hostagent.v1.ResumeSandboxRequest.default_env:type_name -> hostagent.v1.ResumeSandboxRequest.DefaultEnvEntry 71, // 2: hostagent.v1.ResumeSandboxRequest.default_env:type_name -> hostagent.v1.ResumeSandboxRequest.DefaultEnvEntry
72, // 3: hostagent.v1.ResumeSandboxResponse.metadata:type_name -> hostagent.v1.ResumeSandboxResponse.MetadataEntry 72, // 3: hostagent.v1.ResumeSandboxResponse.metadata:type_name -> hostagent.v1.ResumeSandboxResponse.MetadataEntry
16, // 4: hostagent.v1.ListSandboxesResponse.sandboxes:type_name -> hostagent.v1.SandboxInfo 73, // 4: hostagent.v1.ExecRequest.envs:type_name -> hostagent.v1.ExecRequest.EnvsEntry
73, // 5: hostagent.v1.SandboxInfo.metadata:type_name -> hostagent.v1.SandboxInfo.MetadataEntry 16, // 5: hostagent.v1.ListSandboxesResponse.sandboxes:type_name -> hostagent.v1.SandboxInfo
23, // 6: hostagent.v1.ExecStreamResponse.start:type_name -> hostagent.v1.ExecStreamStart 74, // 6: hostagent.v1.SandboxInfo.metadata:type_name -> hostagent.v1.SandboxInfo.MetadataEntry
24, // 7: hostagent.v1.ExecStreamResponse.data:type_name -> hostagent.v1.ExecStreamData 23, // 7: hostagent.v1.ExecStreamResponse.start:type_name -> hostagent.v1.ExecStreamStart
25, // 8: hostagent.v1.ExecStreamResponse.end:type_name -> hostagent.v1.ExecStreamEnd 24, // 8: hostagent.v1.ExecStreamResponse.data:type_name -> hostagent.v1.ExecStreamData
27, // 9: hostagent.v1.WriteFileStreamRequest.meta:type_name -> hostagent.v1.WriteFileStreamMeta 25, // 9: hostagent.v1.ExecStreamResponse.end:type_name -> hostagent.v1.ExecStreamEnd
33, // 10: hostagent.v1.ListDirResponse.entries:type_name -> hostagent.v1.FileEntry 27, // 10: hostagent.v1.WriteFileStreamRequest.meta:type_name -> hostagent.v1.WriteFileStreamMeta
33, // 11: hostagent.v1.MakeDirResponse.entry:type_name -> hostagent.v1.FileEntry 33, // 11: hostagent.v1.ListDirResponse.entries:type_name -> hostagent.v1.FileEntry
42, // 12: hostagent.v1.GetSandboxMetricsResponse.points:type_name -> hostagent.v1.MetricPoint 33, // 12: hostagent.v1.MakeDirResponse.entry:type_name -> hostagent.v1.FileEntry
42, // 13: hostagent.v1.FlushSandboxMetricsResponse.points_10m:type_name -> hostagent.v1.MetricPoint 42, // 13: hostagent.v1.GetSandboxMetricsResponse.points:type_name -> hostagent.v1.MetricPoint
42, // 14: hostagent.v1.FlushSandboxMetricsResponse.points_2h:type_name -> hostagent.v1.MetricPoint 42, // 14: hostagent.v1.FlushSandboxMetricsResponse.points_10m:type_name -> hostagent.v1.MetricPoint
42, // 15: hostagent.v1.FlushSandboxMetricsResponse.points_24h:type_name -> hostagent.v1.MetricPoint 42, // 15: hostagent.v1.FlushSandboxMetricsResponse.points_2h:type_name -> hostagent.v1.MetricPoint
74, // 16: hostagent.v1.PtyAttachRequest.envs:type_name -> hostagent.v1.PtyAttachRequest.EnvsEntry 42, // 16: hostagent.v1.FlushSandboxMetricsResponse.points_24h:type_name -> hostagent.v1.MetricPoint
51, // 17: hostagent.v1.PtyAttachResponse.started:type_name -> hostagent.v1.PtyStarted 75, // 17: hostagent.v1.PtyAttachRequest.envs:type_name -> hostagent.v1.PtyAttachRequest.EnvsEntry
52, // 18: hostagent.v1.PtyAttachResponse.output:type_name -> hostagent.v1.PtyOutput 51, // 18: hostagent.v1.PtyAttachResponse.started:type_name -> hostagent.v1.PtyStarted
53, // 19: hostagent.v1.PtyAttachResponse.exited:type_name -> hostagent.v1.PtyExited 52, // 19: hostagent.v1.PtyAttachResponse.output:type_name -> hostagent.v1.PtyOutput
75, // 20: hostagent.v1.StartBackgroundRequest.envs:type_name -> hostagent.v1.StartBackgroundRequest.EnvsEntry 53, // 20: hostagent.v1.PtyAttachResponse.exited:type_name -> hostagent.v1.PtyExited
63, // 21: hostagent.v1.ListProcessesResponse.processes:type_name -> hostagent.v1.ProcessEntry 76, // 21: hostagent.v1.StartBackgroundRequest.envs:type_name -> hostagent.v1.StartBackgroundRequest.EnvsEntry
23, // 22: hostagent.v1.ConnectProcessResponse.start:type_name -> hostagent.v1.ExecStreamStart 63, // 22: hostagent.v1.ListProcessesResponse.processes:type_name -> hostagent.v1.ProcessEntry
24, // 23: hostagent.v1.ConnectProcessResponse.data:type_name -> hostagent.v1.ExecStreamData 23, // 23: hostagent.v1.ConnectProcessResponse.start:type_name -> hostagent.v1.ExecStreamStart
25, // 24: hostagent.v1.ConnectProcessResponse.end:type_name -> hostagent.v1.ExecStreamEnd 24, // 24: hostagent.v1.ConnectProcessResponse.data:type_name -> hostagent.v1.ExecStreamData
0, // 25: hostagent.v1.HostAgentService.CreateSandbox:input_type -> hostagent.v1.CreateSandboxRequest 25, // 25: hostagent.v1.ConnectProcessResponse.end:type_name -> hostagent.v1.ExecStreamEnd
2, // 26: hostagent.v1.HostAgentService.DestroySandbox:input_type -> hostagent.v1.DestroySandboxRequest 0, // 26: hostagent.v1.HostAgentService.CreateSandbox:input_type -> hostagent.v1.CreateSandboxRequest
4, // 27: hostagent.v1.HostAgentService.PauseSandbox:input_type -> hostagent.v1.PauseSandboxRequest 2, // 27: hostagent.v1.HostAgentService.DestroySandbox:input_type -> hostagent.v1.DestroySandboxRequest
6, // 28: hostagent.v1.HostAgentService.ResumeSandbox:input_type -> hostagent.v1.ResumeSandboxRequest 4, // 28: hostagent.v1.HostAgentService.PauseSandbox:input_type -> hostagent.v1.PauseSandboxRequest
12, // 29: hostagent.v1.HostAgentService.Exec:input_type -> hostagent.v1.ExecRequest 6, // 29: hostagent.v1.HostAgentService.ResumeSandbox:input_type -> hostagent.v1.ResumeSandboxRequest
14, // 30: hostagent.v1.HostAgentService.ListSandboxes:input_type -> hostagent.v1.ListSandboxesRequest 12, // 30: hostagent.v1.HostAgentService.Exec:input_type -> hostagent.v1.ExecRequest
17, // 31: hostagent.v1.HostAgentService.WriteFile:input_type -> hostagent.v1.WriteFileRequest 14, // 31: hostagent.v1.HostAgentService.ListSandboxes:input_type -> hostagent.v1.ListSandboxesRequest
19, // 32: hostagent.v1.HostAgentService.ReadFile:input_type -> hostagent.v1.ReadFileRequest 17, // 32: hostagent.v1.HostAgentService.WriteFile:input_type -> hostagent.v1.WriteFileRequest
31, // 33: hostagent.v1.HostAgentService.ListDir:input_type -> hostagent.v1.ListDirRequest 19, // 33: hostagent.v1.HostAgentService.ReadFile:input_type -> hostagent.v1.ReadFileRequest
34, // 34: hostagent.v1.HostAgentService.MakeDir:input_type -> hostagent.v1.MakeDirRequest 31, // 34: hostagent.v1.HostAgentService.ListDir:input_type -> hostagent.v1.ListDirRequest
36, // 35: hostagent.v1.HostAgentService.RemovePath:input_type -> hostagent.v1.RemovePathRequest 34, // 35: hostagent.v1.HostAgentService.MakeDir:input_type -> hostagent.v1.MakeDirRequest
8, // 36: hostagent.v1.HostAgentService.CreateSnapshot:input_type -> hostagent.v1.CreateSnapshotRequest 36, // 36: hostagent.v1.HostAgentService.RemovePath:input_type -> hostagent.v1.RemovePathRequest
10, // 37: hostagent.v1.HostAgentService.DeleteSnapshot:input_type -> hostagent.v1.DeleteSnapshotRequest 8, // 37: hostagent.v1.HostAgentService.CreateSnapshot:input_type -> hostagent.v1.CreateSnapshotRequest
21, // 38: hostagent.v1.HostAgentService.ExecStream:input_type -> hostagent.v1.ExecStreamRequest 10, // 38: hostagent.v1.HostAgentService.DeleteSnapshot:input_type -> hostagent.v1.DeleteSnapshotRequest
26, // 39: hostagent.v1.HostAgentService.WriteFileStream:input_type -> hostagent.v1.WriteFileStreamRequest 21, // 39: hostagent.v1.HostAgentService.ExecStream:input_type -> hostagent.v1.ExecStreamRequest
29, // 40: hostagent.v1.HostAgentService.ReadFileStream:input_type -> hostagent.v1.ReadFileStreamRequest 26, // 40: hostagent.v1.HostAgentService.WriteFileStream:input_type -> hostagent.v1.WriteFileStreamRequest
38, // 41: hostagent.v1.HostAgentService.PingSandbox:input_type -> hostagent.v1.PingSandboxRequest 29, // 41: hostagent.v1.HostAgentService.ReadFileStream:input_type -> hostagent.v1.ReadFileStreamRequest
40, // 42: hostagent.v1.HostAgentService.Terminate:input_type -> hostagent.v1.TerminateRequest 38, // 42: hostagent.v1.HostAgentService.PingSandbox:input_type -> hostagent.v1.PingSandboxRequest
43, // 43: hostagent.v1.HostAgentService.GetSandboxMetrics:input_type -> hostagent.v1.GetSandboxMetricsRequest 40, // 43: hostagent.v1.HostAgentService.Terminate:input_type -> hostagent.v1.TerminateRequest
45, // 44: hostagent.v1.HostAgentService.FlushSandboxMetrics:input_type -> hostagent.v1.FlushSandboxMetricsRequest 43, // 44: hostagent.v1.HostAgentService.GetSandboxMetrics:input_type -> hostagent.v1.GetSandboxMetricsRequest
47, // 45: hostagent.v1.HostAgentService.FlattenRootfs:input_type -> hostagent.v1.FlattenRootfsRequest 45, // 45: hostagent.v1.HostAgentService.FlushSandboxMetrics:input_type -> hostagent.v1.FlushSandboxMetricsRequest
49, // 46: hostagent.v1.HostAgentService.PtyAttach:input_type -> hostagent.v1.PtyAttachRequest 47, // 46: hostagent.v1.HostAgentService.FlattenRootfs:input_type -> hostagent.v1.FlattenRootfsRequest
54, // 47: hostagent.v1.HostAgentService.PtySendInput:input_type -> hostagent.v1.PtySendInputRequest 49, // 47: hostagent.v1.HostAgentService.PtyAttach:input_type -> hostagent.v1.PtyAttachRequest
56, // 48: hostagent.v1.HostAgentService.PtyResize:input_type -> hostagent.v1.PtyResizeRequest 54, // 48: hostagent.v1.HostAgentService.PtySendInput:input_type -> hostagent.v1.PtySendInputRequest
58, // 49: hostagent.v1.HostAgentService.PtyKill:input_type -> hostagent.v1.PtyKillRequest 56, // 49: hostagent.v1.HostAgentService.PtyResize:input_type -> hostagent.v1.PtyResizeRequest
60, // 50: hostagent.v1.HostAgentService.StartBackground:input_type -> hostagent.v1.StartBackgroundRequest 58, // 50: hostagent.v1.HostAgentService.PtyKill:input_type -> hostagent.v1.PtyKillRequest
62, // 51: hostagent.v1.HostAgentService.ListProcesses:input_type -> hostagent.v1.ListProcessesRequest 60, // 51: hostagent.v1.HostAgentService.StartBackground:input_type -> hostagent.v1.StartBackgroundRequest
65, // 52: hostagent.v1.HostAgentService.KillProcess:input_type -> hostagent.v1.KillProcessRequest 62, // 52: hostagent.v1.HostAgentService.ListProcesses:input_type -> hostagent.v1.ListProcessesRequest
67, // 53: hostagent.v1.HostAgentService.ConnectProcess:input_type -> hostagent.v1.ConnectProcessRequest 65, // 53: hostagent.v1.HostAgentService.KillProcess:input_type -> hostagent.v1.KillProcessRequest
1, // 54: hostagent.v1.HostAgentService.CreateSandbox:output_type -> hostagent.v1.CreateSandboxResponse 67, // 54: hostagent.v1.HostAgentService.ConnectProcess:input_type -> hostagent.v1.ConnectProcessRequest
3, // 55: hostagent.v1.HostAgentService.DestroySandbox:output_type -> hostagent.v1.DestroySandboxResponse 1, // 55: hostagent.v1.HostAgentService.CreateSandbox:output_type -> hostagent.v1.CreateSandboxResponse
5, // 56: hostagent.v1.HostAgentService.PauseSandbox:output_type -> hostagent.v1.PauseSandboxResponse 3, // 56: hostagent.v1.HostAgentService.DestroySandbox:output_type -> hostagent.v1.DestroySandboxResponse
7, // 57: hostagent.v1.HostAgentService.ResumeSandbox:output_type -> hostagent.v1.ResumeSandboxResponse 5, // 57: hostagent.v1.HostAgentService.PauseSandbox:output_type -> hostagent.v1.PauseSandboxResponse
13, // 58: hostagent.v1.HostAgentService.Exec:output_type -> hostagent.v1.ExecResponse 7, // 58: hostagent.v1.HostAgentService.ResumeSandbox:output_type -> hostagent.v1.ResumeSandboxResponse
15, // 59: hostagent.v1.HostAgentService.ListSandboxes:output_type -> hostagent.v1.ListSandboxesResponse 13, // 59: hostagent.v1.HostAgentService.Exec:output_type -> hostagent.v1.ExecResponse
18, // 60: hostagent.v1.HostAgentService.WriteFile:output_type -> hostagent.v1.WriteFileResponse 15, // 60: hostagent.v1.HostAgentService.ListSandboxes:output_type -> hostagent.v1.ListSandboxesResponse
20, // 61: hostagent.v1.HostAgentService.ReadFile:output_type -> hostagent.v1.ReadFileResponse 18, // 61: hostagent.v1.HostAgentService.WriteFile:output_type -> hostagent.v1.WriteFileResponse
32, // 62: hostagent.v1.HostAgentService.ListDir:output_type -> hostagent.v1.ListDirResponse 20, // 62: hostagent.v1.HostAgentService.ReadFile:output_type -> hostagent.v1.ReadFileResponse
35, // 63: hostagent.v1.HostAgentService.MakeDir:output_type -> hostagent.v1.MakeDirResponse 32, // 63: hostagent.v1.HostAgentService.ListDir:output_type -> hostagent.v1.ListDirResponse
37, // 64: hostagent.v1.HostAgentService.RemovePath:output_type -> hostagent.v1.RemovePathResponse 35, // 64: hostagent.v1.HostAgentService.MakeDir:output_type -> hostagent.v1.MakeDirResponse
9, // 65: hostagent.v1.HostAgentService.CreateSnapshot:output_type -> hostagent.v1.CreateSnapshotResponse 37, // 65: hostagent.v1.HostAgentService.RemovePath:output_type -> hostagent.v1.RemovePathResponse
11, // 66: hostagent.v1.HostAgentService.DeleteSnapshot:output_type -> hostagent.v1.DeleteSnapshotResponse 9, // 66: hostagent.v1.HostAgentService.CreateSnapshot:output_type -> hostagent.v1.CreateSnapshotResponse
22, // 67: hostagent.v1.HostAgentService.ExecStream:output_type -> hostagent.v1.ExecStreamResponse 11, // 67: hostagent.v1.HostAgentService.DeleteSnapshot:output_type -> hostagent.v1.DeleteSnapshotResponse
28, // 68: hostagent.v1.HostAgentService.WriteFileStream:output_type -> hostagent.v1.WriteFileStreamResponse 22, // 68: hostagent.v1.HostAgentService.ExecStream:output_type -> hostagent.v1.ExecStreamResponse
30, // 69: hostagent.v1.HostAgentService.ReadFileStream:output_type -> hostagent.v1.ReadFileStreamResponse 28, // 69: hostagent.v1.HostAgentService.WriteFileStream:output_type -> hostagent.v1.WriteFileStreamResponse
39, // 70: hostagent.v1.HostAgentService.PingSandbox:output_type -> hostagent.v1.PingSandboxResponse 30, // 70: hostagent.v1.HostAgentService.ReadFileStream:output_type -> hostagent.v1.ReadFileStreamResponse
41, // 71: hostagent.v1.HostAgentService.Terminate:output_type -> hostagent.v1.TerminateResponse 39, // 71: hostagent.v1.HostAgentService.PingSandbox:output_type -> hostagent.v1.PingSandboxResponse
44, // 72: hostagent.v1.HostAgentService.GetSandboxMetrics:output_type -> hostagent.v1.GetSandboxMetricsResponse 41, // 72: hostagent.v1.HostAgentService.Terminate:output_type -> hostagent.v1.TerminateResponse
46, // 73: hostagent.v1.HostAgentService.FlushSandboxMetrics:output_type -> hostagent.v1.FlushSandboxMetricsResponse 44, // 73: hostagent.v1.HostAgentService.GetSandboxMetrics:output_type -> hostagent.v1.GetSandboxMetricsResponse
48, // 74: hostagent.v1.HostAgentService.FlattenRootfs:output_type -> hostagent.v1.FlattenRootfsResponse 46, // 74: hostagent.v1.HostAgentService.FlushSandboxMetrics:output_type -> hostagent.v1.FlushSandboxMetricsResponse
50, // 75: hostagent.v1.HostAgentService.PtyAttach:output_type -> hostagent.v1.PtyAttachResponse 48, // 75: hostagent.v1.HostAgentService.FlattenRootfs:output_type -> hostagent.v1.FlattenRootfsResponse
55, // 76: hostagent.v1.HostAgentService.PtySendInput:output_type -> hostagent.v1.PtySendInputResponse 50, // 76: hostagent.v1.HostAgentService.PtyAttach:output_type -> hostagent.v1.PtyAttachResponse
57, // 77: hostagent.v1.HostAgentService.PtyResize:output_type -> hostagent.v1.PtyResizeResponse 55, // 77: hostagent.v1.HostAgentService.PtySendInput:output_type -> hostagent.v1.PtySendInputResponse
59, // 78: hostagent.v1.HostAgentService.PtyKill:output_type -> hostagent.v1.PtyKillResponse 57, // 78: hostagent.v1.HostAgentService.PtyResize:output_type -> hostagent.v1.PtyResizeResponse
61, // 79: hostagent.v1.HostAgentService.StartBackground:output_type -> hostagent.v1.StartBackgroundResponse 59, // 79: hostagent.v1.HostAgentService.PtyKill:output_type -> hostagent.v1.PtyKillResponse
64, // 80: hostagent.v1.HostAgentService.ListProcesses:output_type -> hostagent.v1.ListProcessesResponse 61, // 80: hostagent.v1.HostAgentService.StartBackground:output_type -> hostagent.v1.StartBackgroundResponse
66, // 81: hostagent.v1.HostAgentService.KillProcess:output_type -> hostagent.v1.KillProcessResponse 64, // 81: hostagent.v1.HostAgentService.ListProcesses:output_type -> hostagent.v1.ListProcessesResponse
68, // 82: hostagent.v1.HostAgentService.ConnectProcess:output_type -> hostagent.v1.ConnectProcessResponse 66, // 82: hostagent.v1.HostAgentService.KillProcess:output_type -> hostagent.v1.KillProcessResponse
54, // [54:83] is the sub-list for method output_type 68, // 83: hostagent.v1.HostAgentService.ConnectProcess:output_type -> hostagent.v1.ConnectProcessResponse
25, // [25:54] is the sub-list for method input_type 55, // [55:84] is the sub-list for method output_type
25, // [25:25] is the sub-list for extension type_name 26, // [26:55] is the sub-list for method input_type
25, // [25:25] is the sub-list for extension extendee 26, // [26:26] is the sub-list for extension type_name
0, // [0:25] is the sub-list for field type_name 26, // [26:26] is the sub-list for extension extendee
0, // [0:26] is the sub-list for field type_name
} }
func init() { file_hostagent_proto_init() } func init() { file_hostagent_proto_init() }
@ -4699,7 +4724,7 @@ func file_hostagent_proto_init() {
GoPackagePath: reflect.TypeOf(x{}).PkgPath(), GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
RawDescriptor: unsafe.Slice(unsafe.StringData(file_hostagent_proto_rawDesc), len(file_hostagent_proto_rawDesc)), RawDescriptor: unsafe.Slice(unsafe.StringData(file_hostagent_proto_rawDesc), len(file_hostagent_proto_rawDesc)),
NumEnums: 0, NumEnums: 0,
NumMessages: 76, NumMessages: 77,
NumExtensions: 0, NumExtensions: 0,
NumServices: 1, NumServices: 1,
}, },

View File

@ -146,7 +146,7 @@ message CreateSandboxResponse {
string host_ip = 3; string host_ip = 3;
// Runtime metadata collected during sandbox creation (e.g. envd_version, // Runtime metadata collected during sandbox creation (e.g. envd_version,
// kernel_version, firecracker_version, agent_version). // kernel_version, vmm_version, agent_version).
map<string, string> metadata = 4; map<string, string> metadata = 4;
} }
@ -222,6 +222,10 @@ message ExecRequest {
repeated string args = 3; repeated string args = 3;
// Timeout for the command in seconds (default: 30). // Timeout for the command in seconds (default: 30).
int32 timeout_sec = 4; int32 timeout_sec = 4;
// Environment variables to set for the command.
map<string, string> envs = 5;
// Working directory for the command.
string cwd = 6;
} }
message ExecResponse { message ExecResponse {

View File

@ -22,7 +22,7 @@
# Prerequisites: # Prerequisites:
# - wrenn-agent binary at /usr/local/bin/wrenn-agent # - wrenn-agent binary at /usr/local/bin/wrenn-agent
# - wrenn-cp binary at /usr/local/bin/wrenn-cp # - wrenn-cp binary at /usr/local/bin/wrenn-cp
# - firecracker binary at /usr/local/bin/firecracker # - cloud-hypervisor binary at /usr/local/bin/cloud-hypervisor
# - libcap2-bin installed (for setcap) # - libcap2-bin installed (for setcap)
set -euo pipefail set -euo pipefail
@ -41,7 +41,7 @@ WRENN_GROUP="wrenn"
WRENN_DIR="/var/lib/wrenn" WRENN_DIR="/var/lib/wrenn"
AGENT_BIN="/usr/local/bin/wrenn-agent" AGENT_BIN="/usr/local/bin/wrenn-agent"
CP_BIN="/usr/local/bin/wrenn-cp" CP_BIN="/usr/local/bin/wrenn-cp"
FC_BIN="/usr/local/bin/firecracker" CH_BIN="/usr/local/bin/cloud-hypervisor"
RESTORE_CAPS_SCRIPT="/etc/wrenn/restore-caps.sh" RESTORE_CAPS_SCRIPT="/etc/wrenn/restore-caps.sh"
# ── 1. Create system user ─────────────────────────────────────────────────── # ── 1. Create system user ───────────────────────────────────────────────────
@ -100,7 +100,7 @@ done
# routing table manipulation # routing table manipulation
# CAP_NET_RAW — raw socket access (needed by iptables internally) # CAP_NET_RAW — raw socket access (needed by iptables internally)
# CAP_SYS_PTRACE — reading /proc/self/ns/net (netns.Get) # CAP_SYS_PTRACE — reading /proc/self/ns/net (netns.Get)
# CAP_KILL — sending SIGTERM/SIGKILL to Firecracker processes # CAP_KILL — sending SIGTERM/SIGKILL to Cloud Hypervisor processes
# CAP_DAC_OVERRIDE — accessing /dev/loop*, /dev/mapper/*, /dev/net/tun, # CAP_DAC_OVERRIDE — accessing /dev/loop*, /dev/mapper/*, /dev/net/tun,
# /proc/sys/net/ipv4/ip_forward # /proc/sys/net/ipv4/ip_forward
# CAP_MKNOD — creating device nodes (dm-snapshot) # CAP_MKNOD — creating device nodes (dm-snapshot)
@ -120,12 +120,12 @@ else
getcap "${AGENT_BIN}" getcap "${AGENT_BIN}"
fi fi
# Firecracker also needs capabilities when spawned by a non-root parent. # Cloud Hypervisor also needs capabilities when spawned by a non-root parent.
# CAP_NET_ADMIN is required for network device access inside the netns. # CAP_NET_ADMIN is required for network device access inside the netns.
if [[ -f "${FC_BIN}" ]]; then if [[ -f "${CH_BIN}" ]]; then
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep "${FC_BIN}" setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep "${CH_BIN}"
echo " Capabilities set on ${FC_BIN}:" echo " Capabilities set on ${CH_BIN}:"
getcap "${FC_BIN}" getcap "${CH_BIN}"
fi fi
# ── Helper: resolve binary path and apply setcap ──────────────────────────── # ── Helper: resolve binary path and apply setcap ────────────────────────────
@ -191,13 +191,13 @@ setcap_binary() {
setcap "$caps" "$bin" 2>/dev/null || true setcap "$caps" "$bin" 2>/dev/null || true
} }
# wrenn-agent and firecracker (only if present — they aren't package-managed). # wrenn-agent and cloud-hypervisor (only if present — they aren't package-managed).
[[ -f /usr/local/bin/wrenn-agent ]] && \ [[ -f /usr/local/bin/wrenn-agent ]] && \
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \ setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
/usr/local/bin/wrenn-agent 2>/dev/null || true /usr/local/bin/wrenn-agent 2>/dev/null || true
[[ -f /usr/local/bin/firecracker ]] && \ [[ -f /usr/local/bin/cloud-hypervisor ]] && \
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep \ setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep \
/usr/local/bin/firecracker 2>/dev/null || true /usr/local/bin/cloud-hypervisor 2>/dev/null || true
# Child binaries (these are the ones wiped by apt). # Child binaries (these are the ones wiped by apt).
setcap_binary iptables "cap_net_admin,cap_net_raw+ep" setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
@ -315,14 +315,14 @@ ExecStart=/usr/local/bin/wrenn-agent --address ${WRENN_ADVERTISE_ADDR}
Restart=on-failure Restart=on-failure
RestartSec=5 RestartSec=5
# File descriptor limits (Firecracker + loop devices + sockets). # File descriptor limits (Cloud Hypervisor + loop devices + sockets).
LimitNOFILE=65536 LimitNOFILE=65536
LimitNPROC=4096 LimitNPROC=4096
# Protect host filesystem — only allow access to what's needed. # Protect host filesystem — only allow access to what's needed.
ProtectHome=true ProtectHome=true
ReadWritePaths=/var/lib/wrenn /tmp /run/netns /dev/mapper ReadWritePaths=/var/lib/wrenn /tmp /run/netns /dev/mapper
ReadOnlyPaths=/usr/local/bin/firecracker ReadOnlyPaths=/usr/local/bin/cloud-hypervisor
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target