Commit Graph

316 Commits

Author SHA1 Message Date
62e7e9c4e1 Merge branch 'dev' into ci/build-tag-and-release 2026-05-19 06:42:02 +00:00
e87506ce6b Merge pull request 'Migrate to Cloud Hypervisor' (#49) from feat/migrate-to-ch into dev
Reviewed-on: #49
2026-05-18 22:59:57 +00:00
e843be21a4 extensions: auth/sandbox hooks, limits/usage providers, exported session middleware
Cloud-repo extensions could not build on the cookie-session migration —
they still called the removed JWT helpers and duplicated middleware. Move
the session/CSRF middleware and cookie helpers into pkg/auth/session/middleware
as the single source of truth, with thin re-exports on cpextension.

Add hook interfaces so cloud can plug billing without forking OSS:
- AuthHook (OnSignup fail-fast; OnLogin / OnAccount*Delete log+ignore)
- SandboxEventHook (un-acks on error so messages redeliver; idempotent)
- LimitsProvider / UsageProvider (402 on overage; DB-backed usage default)

ServerContext gains OAuthRegistry, Channels, ChannelPub so extensions stop
reimplementing them.
2026-05-19 04:49:19 +06:00
9b34d6a82f events/capsules: dedupe channel notifications, harden create dialog
- channels dispatcher: drop capsule.{create,pause,resume,destroy} events
  with system actor and no reason metadata. Suppresses the goroutine /
  host-callback follow-up that duplicated every user-initiated action in
  notification channels (Telegram, webhooks). Genuinely system-only
  emitters (TTL auto-pause, host monitor reconciler, host failures) all
  set reason, so they continue to notify.
- CreateCapsuleDialog: wrap submit in try/finally so the creating flag
  always clears, and close the dialog before invoking oncreated to avoid
  the parent receiving the new capsule while the dialog is still open.
- capsules page: guard against double-insertion of the same capsule when
  the SSE event arrives before the dialog's oncreated callback resolves.
2026-05-19 04:27:11 +06:00
42af7c4357 auth: replace user JWTs with cookie sessions
User authentication moves from short-lived JWT bearer tokens to opaque
session cookies (wrenn_sid) backed by a Postgres sessions table and a
Redis hot cache. Browsers get a paired wrenn_csrf cookie; all mutating
requests must echo it via X-CSRF-Token (double-submit).

- New pkg/auth/session service: issue/revoke, idle (6h) + absolute
  (24h) lifetimes, switch-team rotation, RevokeAllForUser on password
  events, per-user listing for self-service.
- Middleware: requireSession + requireCSRF replace requireJWT and the
  WS first-message JWT exchange. SSE/WS endpoints rely on the cookie
  flowing on the upgrade — SSE ticket store deleted.
- API keys (wrn_<32hex>) remain for SDK/server use; capsule routes
  accept either via requireSessionOrAPIKey.
- Host-agent JWTs (signed by JWT_SECRET) are unchanged — that channel
  is wrenn-cp ↔ wrenn-agent and unrelated to user identity.
- Frontend client drops bearer-token plumbing, sends credentials and
  the CSRF header on every mutating call.
- OpenAPI + dashboard host-registration docs updated.
2026-05-19 04:01:24 +06:00
4b58dc32ab events: outcome + metadata + transient channel
Rename CapsuleCreated→CapsuleCreate (and pair siblings) into action
verbs, add Outcome (success/error), Metadata, and Error fields to the
canonical Event. Introduce PublishTransient for ephemeral SSE-only
signals (capsule.state.changed) so dashboard transitions don't reach
webhook/telegram subscribers.

Audit logger now publishes the canonical event itself with the derived
outcome, collapsing the old "audit then separately publish" split.
Sandbox event consumer rebuilt around the unified stream: host-agent
callbacks are translated once into canonical events, then fan out via
DB reconciler, channel dispatcher, and SSE relay independently.

Documents the channels subscription model in the README.
2026-05-19 04:01:06 +06:00
09cb78f1b8 feat(frontend): typed capsule/SSE state machine, resilient event stream
Introduce CapsuleStatus union + RESUMABLE_STATUSES / TRANSIENT_STATUSES
sets that mirror the backend state machine; the routes and SnapshotDialog
now derive button enablement from the sets instead of ad-hoc string
checks. Add disk_size_mb + metadata to the Capsule shape.

SSEEventKind union + isSSEEvent guard so malformed wire payloads can't
reach handlers via blind casts. Event stream reconnect now:
  - retries with backoff when the ticket fetch itself fails (previously
    gave up on a single 401/network blip),
  - reconnects immediately on window 'online' and document visibilitychange
    (back to visible) when the EventSource is not OPEN,
  - subscribes to capsule.error.

openapi.yaml: align OAuth paths (/v1/auth/oauth → /auth/oauth to match
the actual mount point), document bearerAuth on capsule routes, fix
'capsulees' typos, and expand schemas for the new state machine surfaces
the frontend now consumes.
2026-05-19 01:29:38 +06:00
f002839c48 feat(api): plumb root ctx through SSE store, surface pause/resume failures, clamp TTL
NewSSETicketStore now takes a context so its cleanup goroutine exits on
server shutdown instead of leaking for the process lifetime. Threaded
through api.New and pkg/cpserver/run.go.

SandboxEventConsumer learns sandbox.pause_failed / sandbox.resume_failed
event types and forwards TeamID from the publisher; server.go propagates
TeamID into the SSE broker so per-team subscribers receive failure
events. resumeInBackground now rolls resuming → paused on failure (was
resuming → error) so the user can retry without manual intervention.

pkg/service/sandbox: mirror internal/sandbox.MinTimeoutSec + clampTimeout
on the control plane so the DB row's timeout_sec agrees with what the
agent runs after its own silent clamp.
2026-05-19 01:29:28 +06:00
cab50db1c1 refactor(sandbox): per-sandbox dir layout, atomic envd client, sentinel errors
Move CoW from sandboxes/{id}.cow to sandboxes/{id}/rootfs.cow so every
per-sandbox artifact lives under one parent. PauseSnapshotDir now aliases
SandboxDir; Pause stages the CoW into the staging dir before swapDir so
the swap carries it through.

Publish sb.client via atomic.Pointer so Exec/Pty/Process callers can load
without holding lifecycleMu; Pause's releaseRuntime stores nil, Resume
stores a fresh client. Funnel every caller through new activeClient()
that nil-checks after Load to close the pause-vs-exec race.

Replace string-sniffing for "not found" / "not running" with sentinel
errors (ErrNotFound, ErrNotRunning, ErrNotPaused, ErrInvalidRange) and a
single mapSandboxError switch in the hostagent server. Add
parseSandboxIDs helper for the repeated team+template UUID parse.

Rewrite ConnTracker off sync.WaitGroup onto an explicit counter + zeroCh
so Drain/ForceClose can select on cancellation and timeout without
leaking the waiter goroutine on repeated pause failures.

Add internal/sandbox/punch.go: post-snapshot SEEK_DATA scan that
fallocate-punches any 4 KiB block of zeros in CH memory-* files (guest
dirty-then-free pages CH writes verbatim). Run after both pause snapshot
and CreateSnapshot. Bump envd quiesce sleep 500ms → 1s so the kernel
fully flushes before CH dumps memory.

Add sandboxDirOverride threaded through snapshotMeta + restoreVMConfig:
sandboxes launched from snapshot templates carry the original source
sandbox's tmpfs path in CH's saved config.json, so every subsequent
restore must reuse it.
2026-05-19 01:29:20 +06:00
802af222ee feat(sandbox): launch sandboxes from snapshot templates
New createFromSnapshotTemplate path branches off Manager.Create when the
template directory contains a CH memory snapshot (state.json + config.json
+ rootfs.ext4). Mirrors the pause/resume restore mechanics — same UFFD
lazy memory + post-restore memory loader — but produces a fresh sandbox
per call (new ID, new slot, new CoW on the shared flattened rootfs).

Shared restore primitives extracted to restore.go (buildRestoreVMConfig,
launchRestoredVM, initAndStartMemoryLoader) and reused by resumeFromMeta.

Chain correctness: descendants of snapshot templates start the memory
loader so subsequent CreateSnapshot from them is self-contained.

Defensive guards:
- CreateSnapshot refuses to overwrite an existing template dir.
- DeleteSnapshot refuses when running sandboxes still reference it.
- TimeoutSec clamped to MinTimeoutSec=60 to keep TTL reaper well clear of
  the post-create startup window.
- Snapshot routing skips minimal template even if a stray state.json lands.

vm.SandboxTmpDir / vm.SandboxSocketPath extracted so launchers don't
re-derive CH disk paths independently.
2026-05-18 15:20:45 +06:00
8262a4999e feat(vm): CH live snapshot, pause/resume with UFFD memory restore
Pause and live-snapshot share one CH primitive (ch.pause + ch.snapshot +
ch.destroy/resume). Pause writes artefacts to a staging dir and
atomically swaps to avoid CH re-reading a memory-ranges file mid-rewrite
across pause-resume-pause chains. Resume uses
memory_restore_mode=ondemand backed by userfaultfd; CH lazily faults
pages from the source file. A new envd /memory/preload endpoint
materialises every physical page (one byte per page via /dev/mem,
fallback /proc/kcore) so a subsequent snapshot writes a self-contained
file instead of holes.

Sandbox manager refactor: lifecycle / pause / resume code extracted to
internal/sandbox/pause.go, leaving manager.go focused on the in-memory
state map and orchestration entry points (-871 / +72). Stale CH process
and dm-snapshot cleanup runs at agent startup (internal/vm/cleanup.go)
and via scripts/cleanup-stale.sh for operator use.

Host monitor honors the agent's reported per-sandbox status when
reconciling missing rows (so an agent-side pause during a CP
disconnect isn't silently promoted back to running). New
BulkRestoreMissingToStatus query replaces the running-only path.
Transient statuses (pausing/resuming/starting/stopping) defer
reconciliation to the next tick.
2026-05-18 14:05:27 +06:00
b9cb3998f8 feat(api): real-time SSE event stream for sandbox lifecycle
In-process broker fans out sandbox state events (created/paused/running/
destroyed) to connected SSE clients, filtered by team. Backend publishes
through the channels Publisher; an SSE relay subscribes to Redis Pub/Sub
and dispatches to subscribers. Browser auth uses short-lived tickets
issued via /v1/events/token; SDKs use header auth. Admin routes get a
parallel stream that sees all teams. Frontend dashboard and admin
capsule pages subscribe to push state changes instead of polling.

Sandbox event publishing moved out of AuditLogger into the service layer
so callbacks from the host agent and direct state changes share one
path.
2026-05-18 14:05:06 +06:00
62bede5dae fix: resolve bugs and DRY violations in sandbox manager and API handlers
- Fix createFromSnapshot discarding memoryMB param (balloon optimization was dead)
- Fix double dm-snapshot removal in Pause() cleanupPauseFailure path
- Fix DestroySandbox RPC mapping all errors to CodeNotFound
- Fix handleFailed event consumer missing pausing/resuming → error transitions
- Fix stream resource leak in StreamUpload on early-return paths
- Add envs/cwd fields to ExecRequest proto for foreground exec parity
- Extract createResources rollback helper to eliminate 4x duplicated teardown
- Remove unused chClient.ping method
- Add .mcp.json to gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-17 02:30:32 +06:00
74f85ce4e9 refactor: polish control plane and host agent code
- Decompose executeBuild (318 lines) into provisionBuildSandbox and
  finalizeBuild helpers for readability
- Extract cleanupPauseFailure in sandbox manager to unify 3 inconsistent
  inline teardown paths (also fixes CoW file leak on rename failure)
- Remove unused ctx parameter from startProcess/startProcessForRestore
- Add missing MASQUERADE rollback entry in CreateNetwork for symmetry
- Consolidate duplicate writeJSON for UTF-8/base64 exec response
2026-05-17 02:11:48 +06:00
124e097e23 refactor: eliminate DRY violations across control plane and host agent
Extract shared helpers to consolidate repeated patterns:
- requireRunningSandbox: sandbox lookup + running check (10 call sites)
- upgradeAndAuthenticate: WS upgrade + JWT/API-key auth (3 handlers)
- updateLastActive: last_active_at update with background context (5 sites)
- attachCowAndCreate: cow loop attach + dmsetup create (devicemapper)
- issueRegistrationToken: token gen + Redis + audit (host service)
- ErrNotFound sentinel: replaces string matching in hostagent server

Also merges duplicate wsProcessOut/wsOutMsg types into one.

Net: -208 lines, zero behavior change.
2026-05-17 02:03:06 +06:00
a5425969ed fix: assorted bug fixes for CH migration
Fix resource leaks, race conditions, and error handling across host
agent and control plane: proper sparse file cleanup on close error,
connect error wrapping for MakeDir, CoW file cleanup on pause failure,
per-sandbox VM directories, deferred map deletion to avoid race in VM
destroy, and goroutine launch for extension background workers.
2026-05-17 01:47:56 +06:00
fb16bc9ed1 chore: update proto, scripts, and docs for CH migration
- Update hostagent proto: firecracker_version → vmm_version in metadata
- Regenerate hostagent.pb.go
- Update .env.example: WRENN_FIRECRACKER_BIN → WRENN_CH_BIN
- Update Makefile: remove --isnotfc from dev-envd target
- Update prepare-wrenn-user.sh: firecracker → cloud-hypervisor paths
  and capability assignments
- Update wrenn-init.sh: disable write_zeroes on rootfs for dm-snapshot
  compatibility with CH
- Update README.md and CLAUDE.md: Firecracker → Cloud Hypervisor
  throughout
2026-05-17 01:33:35 +06:00
dd8a940431 feat(envd): update guest agent for Cloud Hypervisor
Remove Firecracker-specific MMDS metadata fetching and metrics host
module. CH communicates with the guest purely over TAP networking,
so MMDS (Firecracker's metadata service via MMDS address) is no longer
needed.

- Remove src/host/ module (mmds.rs, metrics.rs)
- Remove reqwest dependency (was only used for MMDS HTTP calls)
- Remove --isnotfc CLI flag (no longer dual-mode)
- Simplify health endpoint and init handler
- Update state management for CH snapshot lifecycle
- Bump version to 0.3.0
2026-05-17 01:33:25 +06:00
eaa6b8576d feat(vm): replace Firecracker with Cloud Hypervisor
Migrate the entire VM layer from Firecracker to Cloud Hypervisor (CH).
CH provides native snapshot/restore via its HTTP API, eliminating the
need for custom UFFD handling, memfile processing, and snapshot header
management that Firecracker required.

Key changes:
- Remove fc.go, jailer.go (FC process management)
- Remove internal/uffd/ package (userfaultfd lazy page loading)
- Remove snapshot/header.go, mapping.go, memfile.go (FC snapshot format)
- Add ch.go (CH HTTP API client over Unix socket)
- Add process.go (CH process lifecycle with unshare+netns)
- Add chversion.go (CH version detection)
- Refactor sandbox manager: remove UFFD socket tracking, snapshot
  parent/diff chaining, FC-specific balloon logic; add crash watcher
- Simplify snapshot/local.go to CH's native snapshot format
- Update VM config: FirecrackerBin → VMMBin, new CH-specific fields
- Update envdclient, devicemapper, network for CH compatibility
2026-05-17 01:33:12 +06:00
c2dc382787 Updated openapi schema 2026-05-16 18:32:37 +06:00
3671af2498 feat: immediate sandbox reconciliation on host reconnect
When a host transitions from unreachable → online via heartbeat, trigger
ReconcileHost in a background goroutine so "missing" sandboxes are
resolved instantly instead of waiting up to 60s for the next monitor tick.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-16 16:15:49 +06:00
e34bcedc31 Merge pull request 'fix/remove-sync-updates' (#47) from fix/remove-sync-updates into dev
Reviewed-on: #47
2026-05-15 08:08:07 +00:00
ff91ef3edf Bump versions 2026-05-15 13:56:04 +06:00
ba3a3db98c Updated openapi specs 2026-05-15 12:39:06 +06:00
6faad45a28 feat: async sandbox lifecycle with Redis Stream events
Replace synchronous RPC-based CP-host communication for sandbox
lifecycle operations (Create, Pause, Resume, Destroy) with an async
pattern. CP handlers now return 202 Accepted immediately, fire agent
RPCs in background goroutines, and publish state events to a Redis
Stream. A background consumer processes events as a fallback writer.

Agent-side auto-pause events are pushed to the CP via HTTP callback
(POST /v1/hosts/sandbox-events), keeping Redis internal to the CP.

All DB status transitions use conditional updates
(UpdateSandboxStatusIf, UpdateSandboxRunningIf) to prevent race
conditions between concurrent operations and background goroutines.

The HostMonitor reconciler is kept at 60s as a safety net, extended
to handle transient statuses (starting, pausing, resuming, stopping).

Frontend updated to handle 202 responses with empty bodies and render
transient statuses with blue indicators.
2026-05-15 12:25:16 +06:00
cdb178d8d1 Created CI pipeline for building, tagging and publishing to GitHub 2026-05-13 23:27:33 +06:00
c08884fa2c Merge branch 'main' of git.omukk.dev:wrenn/wrenn into dev 2026-05-13 11:05:49 +06:00
4707f16c76 v0.1.6 (#45)
## What's New?
Performance updates for large capsules, admin panel enhancement and bug fixes

### Envd
- Fixed bug with sandbox metrics calculation
- Page cache drop and balloon inflation to reduce memfile snapshot
- Updated rpc timeout logic for better control
- Added tests

### Admin Panel
- Add/Remove platform admin
- Updated template deletion logic for fine grained permission

### Others
- Minor frontend visual improvement
- Minor bugfixes
- Version bump

Co-authored-by: Tasnim Kabir Sadik <tksadik92@gmail.com>
Reviewed-on: #45
Co-authored-by: pptx704 <rafeed@omukk.dev>
Co-committed-by: pptx704 <rafeed@omukk.dev>
v0.1.6
2026-05-13 05:05:35 +00:00
6164d7cae3 version bump 2026-05-13 10:58:54 +06:00
dc6776cc8f fix(agent): register with CP before inflating rootfs images 2026-05-13 10:52:22 +06:00
0bfda08f47 Merge pull request 'test (envd): add 136 unit tests across 12 modules' (#44) from testing/envd into dev
Reviewed-on: #44
2026-05-13 04:42:06 +00:00
485be22a16 test(envd): add 136 unit tests across 12 modules
Cover all pure-function modules with inline #[cfg(test)] blocks:
crypto (NIST/RFC 4231 known-answer vectors), auth (SecureToken ops,
signature generation/validation), conntracker (snapshot lifecycle),
execcontext, util (AtomicMax concurrent correctness), http/encoding
(RFC 7231 negotiation), port/conn (/proc/net/tcp parsing),
rpc/entry (format_permissions), and permissions/path (tilde expansion,
ensure_dirs). Add tempfile dev-dep for filesystem tests. Update
Makefile test target to include cargo test.
2026-05-13 10:39:54 +06:00
ead406bdac Merge pull request 'fix: resolve large operation reliability — stream hangs, pause races, and memory bloat' (#43) from fix/large-operations into dev
Reviewed-on: #43
2026-05-13 03:44:41 +00:00
1472d77b52 Merge branch 'dev' into fix/large-operations 2026-05-13 03:44:19 +00:00
6a0fea30a6 Rootfs script updated 2026-05-13 09:35:06 +06:00
8c34388fc2 Changed commands to check if envd is statically linked or not 2026-05-12 23:19:30 +06:00
aca43d51eb fix: resolve process stream hangs, pause race, and PTY signal loss
- Cache terminal EndEvent on ProcessHandle so connect() can detect
  already-exited processes instead of hanging forever on broadcast
  receivers that missed the event. Subscribe before checking cache
  to close the TOCTOU window.

- Protect sb.Status writes in Pause with m.mu to prevent data race
  with concurrent readers (AcquireProxyConn, Exec, etc.).

- Restart metrics sampler in restoreRunning so a failed pause attempt
  doesn't permanently kill sandbox metrics collection.

- Return dequeued non-input messages from coalescePtyInput instead of
  dropping them, preventing silent loss of kill/resize signals during
  typing bursts.
2026-05-09 18:11:15 +06:00
522e1c5e90 fix: subscribe to process channels before spawning threads to prevent event loss
Fast-exiting processes (e.g. echo) sent data/end events before
start() subscribed to the broadcast channels, causing the stream
to hang indefinitely and the exec RPC to time out with 502.

Move channel subscription into spawn_process, before reader/waiter
threads start, and return pre-subscribed receivers via SpawnedProcess.
2026-05-09 17:28:37 +06:00
d1d316f35c fix: resolve exec 502 by terminating process streams on exit
The start() and connect() streaming RPCs blocked forever in the data
event loop because ProcessHandle retains a broadcast sender (needed for
reconnection via connect()), preventing the channel from closing.

Race data_rx against end_rx with tokio::select! so the stream terminates
when the process exits. Remaining buffered data is drained before
yielding the end event.
2026-05-09 16:36:33 +06:00
2af8412cdc fix: use RwLock for envd Defaults to fix silent mutation loss
The /init handler's default_user mutation cloned the Defaults struct,
mutated the clone, then dropped it — the actual state was never updated.
This caused processes to always run as "root" regardless of the user
set via POST /init. Additionally, default_workdir was accepted in the
init request but never applied.

Wrap user and workdir fields in RwLock with accessor methods so mutations
propagate correctly through the shared AppState.
2026-05-09 15:28:09 +06:00
c93ad5e2db fix: harden pause flow with connection isolation and UFFD event handling
Restructure pause to: block new operations (StatusPausing), drain proxy
connections with 5s grace, force-close remaining via context cancellation,
drop page cache, inflate balloon, then freeze vCPUs. Previously connections
could arrive during the pause window and API operations weren't blocked.

Handle UFFD_EVENT_REMOVE/UNMAP/REMAP/FORK gracefully instead of crashing
the UFFD server. These events fire during balloon deflation on snapshot
restore, killing the page fault handler and preventing VM boot.

Also adds ConnTracker.ForceClose() with cancellable context propagated
through the proxy handler, so lingering proxy connections are actively
terminated rather than left dangling.
2026-05-09 14:51:19 +06:00
38799770db fix: inflate balloon before snapshot to reduce memfile size
Firecracker dumps the entire VM memory region regardless of guest
usage. A 20GB VM using 500MB still produces a ~20GB memfile because
freed pages retain stale data (non-zero blocks).

Inflate the balloon device before snapshot to reclaim free guest
memory. Balloon pages become zero from FC's perspective, allowing
ProcessMemfile to skip them. This reduces memfile size from ~20GB
to ~1-2GB for lightly-used VMs.

- Pause: read guest memory usage, inflate balloon to reclaim free
  pages, wait 2s for guest kernel to process, then proceed
- Resume: deflate balloon to 0 after PostInit so guest gets full
  memory back
- createFromSnapshot: same deflation since template snapshots
  inherit inflated balloon state
- All balloon ops are best-effort with debug logging on failure
2026-05-05 15:38:04 +06:00
51b5d7b3ba fix: resolve pause/snapshot failures and CoW exhaustion on large VMs
Remove hard 10s timeout from Firecracker HTTP client — callers already
pass context.Context with appropriate deadlines, and 20GB+ memfile
writes easily exceed 10s.

Ensure CoW file is at least as large as the origin rootfs. Previously,
WRENN_DEFAULT_ROOTFS_SIZE=30Gi expanded the base image to 30GB but the
default 5GB CoW could not hold all writes, causing dm-snapshot
invalidation and EIO on all guest I/O.

Destroy frozen VMs in resumeOnError instead of leaving zombies that
report "running" but can't execute. Use fresh context for the resume
attempt so a cancelled caller context doesn't falsely trigger destroy.

Increase CP→Agent ResponseHeaderTimeout from 45s to 5min and
PrepareSnapshot timeout from 3s to 30s for large-memory VMs.

After failed pause, ping agent to detect destroyed sandboxes and mark
DB status as "error" instead of reverting to "running".
2026-05-04 01:46:57 +06:00
fd5fa28205 Merge pull request 'Enhanced frontend ux' (#42) from enhance/frontend into dev
Reviewed-on: #42
2026-05-03 11:08:48 +00:00
1244c08e42 fix: fetch sandbox metrics immediately on page load
Metrics data was only fetched after Chart.js dynamic import completed,
leaving graphs empty until the first poll interval fired. Now
loadMetrics() runs in parallel with the Chart.js import, and
initCharts() resets the dedup key so pre-fetched data populates
newly created chart instances.
2026-05-03 16:43:26 +06:00
021d709de2 feat: show template owner and restrict delete in admin panel
Add Owner column to admin templates table, resolving team IDs to names
via admin teams API. Disable delete for non-platform templates and the
minimal template, with contextual tooltips explaining why.
2026-05-03 15:51:20 +06:00
cac6fcd626 feat: admin grant/revoke from admin panel
Add PUT /v1/admin/users/{id}/admin endpoint and frontend UI for
granting and revoking platform admin status. Uses atomic conditional
SQL (RevokeUserAdmin) to prevent race conditions that could remove
the last admin. Includes idempotency check, audit logging, and
confirmation dialog with self-demotion warning.
2026-05-03 15:24:34 +06:00
4954b19d7c fix: merge capsule data in-place to prevent visual refresh on poll
Replaces full array assignment with granular merge that reuses existing
Svelte proxy objects, so only rows with actual data changes re-render.
2026-05-03 15:09:21 +06:00
01819642cc fix: drop page cache before snapshot to reduce memory dump size
Linux keeps freed memory as page cache, which Firecracker snapshots
as non-zero blocks. A 16GB VM with 12GB stale cache would write all
12GB to disk. Dropping pagecache (not dentries/inodes) in
/snapshot/prepare before blocking the reclaimer shrinks snapshots
to actual working set size with minimal resume latency impact.
2026-05-03 14:27:49 +06:00
cb28f7759d Merge pull request 'fix: accurate sandbox metrics and memory management' (#41) from bugfix/sandbox-metrics-calculations into dev
Reviewed-on: #41
2026-05-03 06:41:41 +00:00