Files
sandbox/CLAUDE.md
pptx704 a1bd439c75 Add sandbox snapshot and restore with UFFD lazy memory loading
Implement full snapshot lifecycle: pause (snapshot + free resources),
resume (UFFD-based lazy restore), and named snapshot templates that
can spawn new sandboxes from frozen VM state.

Key changes:
- Snapshot header system with generational diff mapping (inspired by e2b)
- UFFD server for lazy page fault handling during snapshot restore
- Stable rootfs symlink path (/tmp/fc-vm/) for snapshot compatibility
- Templates DB table and CRUD API endpoints (POST/GET/DELETE /v1/snapshots)
- CreateSnapshot/DeleteSnapshot RPCs in hostagent proto
- Reconciler excludes paused sandboxes (expected absent from host agent)
- Snapshot templates lock vcpus/memory to baked-in values
- Proper cleanup of uffd sockets and pause snapshot files on destroy
2026-03-12 09:19:37 +06:00

16 KiB
Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Wrenn Sandbox is a microVM-based code execution platform. Users create isolated sandboxes (Firecracker microVMs), run code inside them, and get output back via SDKs. Think E2B but with persistent sandboxes, pool-based pricing, and a single-binary deployment story.

Build & Development Commands

All commands go through the Makefile. Never use raw go build or go run.

make build              # Build all binaries → builds/
make build-cp           # Control plane only
make build-agent        # Host agent only
make build-envd         # envd static binary (verified statically linked)

make dev                # Full local dev: infra + migrate + control plane
make dev-infra          # Start PostgreSQL + Prometheus + Grafana (Docker)
make dev-down           # Stop dev infra
make dev-cp             # Control plane with hot reload (if air installed)
make dev-agent          # Host agent (sudo required)
make dev-envd           # envd in TCP debug mode

make check              # fmt + vet + lint + test (CI order)
make test               # Unit tests: go test -race -v ./internal/...
make test-integration   # Integration tests (require host agent + Firecracker)
make fmt                # gofmt both modules
make vet                # go vet both modules
make lint               # golangci-lint

make migrate-up         # Apply pending migrations
make migrate-down       # Rollback last migration
make migrate-create name=xxx  # Scaffold new goose migration (never create manually)
make migrate-reset      # Drop + re-apply all

make generate           # Proto (buf) + sqlc codegen
make proto              # buf generate for all proto dirs
make tidy               # go mod tidy both modules

Run a single test: go test -race -v -run TestName ./internal/path/...

Architecture

User SDK → HTTPS/WS → Control Plane → Connect RPC → Host Agent → HTTP/Connect RPC over TAP → envd (inside VM)

Three binaries, two Go modules:

Binary Module Entry point Runs as
wrenn-cp git.omukk.dev/wrenn/sandbox cmd/control-plane/main.go Unprivileged
wrenn-agent git.omukk.dev/wrenn/sandbox cmd/host-agent/main.go Root (NET_ADMIN + /dev/kvm)
envd git.omukk.dev/wrenn/sandbox/envd (standalone envd/go.mod) envd/main.go PID 1 inside guest VM

envd is a completely independent Go module. It is never imported by the main module. The only connection is the protobuf contract. It compiles to a static binary baked into rootfs images.

Key architectural invariant: The host agent is stateful (in-memory boxes map is the source of truth for running VMs). The control plane is stateless (all persistent state in PostgreSQL). The reconciler (internal/api/reconciler.go) bridges the gap — it periodically compares DB records against the host agent's live state and marks orphaned sandboxes as "stopped".

Control Plane

Packages: internal/api/, internal/admin/, internal/auth/, internal/scheduler/, internal/lifecycle/, internal/config/, internal/db/

Startup (cmd/control-plane/main.go) wires: config (env vars) → pgxpool → db.Queries (sqlc-generated) → Connect RPC client to host agent → api.Server. Everything flows through constructor injection.

  • API Server (internal/api/server.go): chi router with middleware. Creates handler structs (sandboxHandler, execHandler, filesHandler, etc.) injected with db.Queries and the host agent Connect RPC client. Routes under /v1/sandboxes/*.
  • Reconciler (internal/api/reconciler.go): background goroutine (every 30s) that compares DB records against agent.ListSandboxes() RPC. Marks orphaned DB entries as "stopped".
  • Admin UI at /admin/ (htmx + Go html/template, no SPA, no build step)
  • Database: PostgreSQL via pgx/v5. Queries generated by sqlc from db/queries/sandboxes.sql. Migrations in db/migrations/ (goose, plain SQL).
  • Config (internal/config/config.go): purely environment variables (DATABASE_URL, CP_LISTEN_ADDR, CP_HOST_AGENT_ADDR), no YAML/file config.

Host Agent

Packages: internal/hostagent/, internal/sandbox/, internal/vm/, internal/network/, internal/filesystem/, internal/envdclient/, internal/snapshot/

Startup (cmd/host-agent/main.go) wires: root check → enable IP forwarding → sandbox.Manager (containing vm.Manager + network.SlotAllocator) → hostagent.Server (Connect RPC handler) → HTTP server.

  • RPC Server (internal/hostagent/server.go): implements hostagentv1connect.HostAgentServiceHandler. Thin wrapper — every method delegates to sandbox.Manager. Maps Connect error codes on return.
  • Sandbox Manager (internal/sandbox/manager.go): the core orchestration layer. Maintains in-memory state in boxes map[string]*sandboxState (protected by sync.RWMutex). Each sandboxState holds a models.Sandbox, a *network.Slot, and an *envdclient.Client. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes.
  • VM Manager (internal/vm/manager.go, fc.go, config.go): manages Firecracker processes. Uses raw HTTP API over Unix socket (/tmp/fc-{sandboxID}.sock), not the firecracker-go-sdk Machine type. Launches Firecracker via unshare -m + ip netns exec. Configures VM via PUT to /boot-source, /drives/rootfs, /network-interfaces/eth0, /machine-config, then starts with PUT /actions.
  • Network (internal/network/setup.go, allocator.go): per-sandbox network namespace with veth pair + TAP device. See Networking section below.
  • Filesystem (internal/filesystem/clone.go): CoW rootfs clones via cp --reflink=auto.
  • envd Client (internal/envdclient/client.go, health.go): dual interface to the guest agent. Connect RPC for streaming process exec (process.Start() bidirectional stream). Plain HTTP for file operations (POST/GET /files?path=...&username=root). Health check polls GET /health every 100ms until ready (30s timeout).

envd (Guest Agent)

Module: envd/ with its own go.mod (git.omukk.dev/wrenn/sandbox/envd)

Runs as PID 1 inside the microVM via wrenn-init.sh (mounts procfs/sysfs/dev, sets hostname, writes resolv.conf, then execs envd). Extracted from E2B (Apache 2.0), with shared packages internalized into envd/internal/shared/. Listens on TCP 0.0.0.0:49983.

  • ProcessService: start processes, stream stdout/stderr, signal handling, PTY support
  • FilesystemService: stat/list/mkdir/move/remove/watch files
  • Health: GET /health

Networking (per sandbox)

Each sandbox gets its own Linux network namespace (ns-{idx}). Slot index (1-based, up to 65534) determines all addressing:

Host Namespace                      Namespace "ns-{idx}"                   Guest VM
──────────────────────────────────────────────────────────────────────────────────────
veth-{idx}  ←──── veth pair ────→  eth0
10.12.0.{idx*2}/31                 10.12.0.{idx*2+1}/31
                                     │
                                   tap0 (169.254.0.22/30) ←── TAP ──→ eth0 (169.254.0.21)
                                                                          ↑ kernel ip= boot arg
  • Host-reachable IP: 10.11.0.{idx}/32 — routed through veth to namespace, DNAT'd to guest
  • Outbound NAT: guest (169.254.0.21) → SNAT to vpeerIP inside namespace → MASQUERADE on host to default interface
  • Inbound NAT: host traffic to 10.11.0.{idx} → DNAT to 169.254.0.21 inside namespace
  • IP forwarding enabled inside each namespace
  • All details in internal/network/setup.go

Sandbox State Machine

PENDING → STARTING → RUNNING → PAUSED → HIBERNATED
                       │          │
                       ↓          ↓
                    STOPPED    STOPPED → (destroyed)

Any state → ERROR (on crash/failure)
PAUSED → RUNNING (warm snapshot resume)
HIBERNATED → RUNNING (cold snapshot resume, slower)

Key Request Flows

Sandbox creation (POST /v1/sandboxes):

  1. API handler generates sandbox ID, inserts into DB as "pending"
  2. RPC CreateSandbox → host agent → sandbox.Manager.Create()
  3. Manager: resolve base rootfs → cp --reflink clone → allocate network slot → CreateNetwork() (netns + veth + tap + NAT) → vm.Create() (start Firecracker, configure via HTTP API, boot) → envdclient.WaitUntilReady() (poll /health) → store in-memory state
  4. API handler updates DB to "running" with host_ip

Command execution (POST /v1/sandboxes/{id}/exec):

  1. API handler verifies sandbox is "running" in DB
  2. RPC Exec → host agent → sandbox.Manager.Exec()envdclient.Exec()
  3. envd client opens bidirectional Connect RPC stream (process.Start), collects stdout/stderr/exit_code
  4. API handler checks UTF-8 validity (base64-encodes if binary), updates last_active_at, returns result

Streaming exec (WS /v1/sandboxes/{id}/exec/stream):

  1. WebSocket upgrade, read first message for cmd/args
  2. RPC ExecStream → host agent → sandbox.Manager.ExecStream()envdclient.ExecStream()
  3. envd client returns a channel of events; host agent forwards events through the RPC stream
  4. API handler forwards stream events to WebSocket as JSON messages ({type: "stdout"|"stderr"|"exit", ...})

File transfer: Write uses multipart POST to envd /files; read uses GET. Streaming variants chunk in 64KB pieces through the RPC stream.

REST API

Routes defined in internal/api/server.go, handlers in internal/api/handlers_*.go. OpenAPI spec embedded via //go:embed and served at /openapi.yaml (Swagger UI at /docs). JSON request/response. API key auth via X-API-Key header. Error responses: {"error": {"code": "...", "message": "..."}}.

Code Generation

Proto (Connect RPC)

Proto source of truth is proto/envd/*.proto and proto/hostagent/*.proto. Run make proto to regenerate. Three buf.gen.yaml files control output:

buf.gen.yaml location Generates to Used by
proto/envd/buf.gen.yaml proto/envd/gen/ Main module (host agent's envd client)
proto/hostagent/buf.gen.yaml proto/hostagent/gen/ Main module (control plane ↔ host agent)
envd/spec/buf.gen.yaml envd/internal/services/spec/ envd module (guest agent server)

The envd buf.gen.yaml reads from ../../proto/envd/ (same source protos) but generates into envd's own module. This means the same .proto files produce two independent sets of Go stubs — one for each Go module.

To add a new RPC method: edit the .proto file → make proto → implement the handler on both sides.

sqlc

Config: sqlc.yaml (project root). Reads queries from db/queries/*.sql, reads schema from db/migrations/, outputs to internal/db/.

To add a new query: add it to the appropriate .sql file in db/queries/make generate → use the new method on *db.Queries.

Key Technical Decisions

  • Connect RPC (not gRPC) for all RPC communication between components
  • Buf + protoc-gen-connect-go for code generation (not protoc-gen-go-grpc)
  • Raw Firecracker HTTP API via Unix socket (not firecracker-go-sdk Machine type)
  • TAP networking (not vsock) for host-to-envd communication
  • PostgreSQL via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down)
  • Admin UI: htmx + Go html/template + chi router. No SPA, no React, no build step
  • Lago for billing (external service, not in this codebase)

Coding Conventions

  • Go style: gofmt, go vet, context.Context everywhere, errors wrapped with fmt.Errorf("action: %w", err), slog for logging, no global state
  • Naming: Sandbox IDs sb- + 8 hex, API keys wrn_ + 32 chars, Host IDs host- + 8 hex
  • Dependencies: Use go get to add deps, never hand-edit go.mod. For envd deps: cd envd && go get ... (separate module)
  • Generated code: Always commit generated code (proto stubs, sqlc). Never add generated code to .gitignore
  • Migrations: Always use make migrate-create name=xxx, never create migration files manually
  • Testing: Table-driven tests for handlers and state machine transitions

Two-module gotcha

The main module (go.mod) and envd (envd/go.mod) are fully independent. make tidy, make fmt, make vet already operate on both. But when adding dependencies manually, remember to target the correct module (cd envd && go get ... for envd deps). make proto also generates stubs for both modules from the same proto sources.

Rootfs & Guest Init

  • wrenn-init (images/wrenn-init.sh): the PID 1 init script baked into every rootfs. Mounts virtual filesystems, sets hostname, writes /etc/resolv.conf, then execs envd.
  • Updating the rootfs after changing envd or wrenn-init: bash scripts/update-debug-rootfs.sh [rootfs_path]. This builds envd via make build-envd, mounts the rootfs image, copies in the new binaries, and unmounts. Defaults to /var/lib/wrenn/images/minimal.ext4.
  • Rootfs images are minimal debootstrap — no systemd, no coreutils beyond busybox. Use /bin/sh -c for shell builtins inside the guest.

Fixed Paths (on host machine)

  • Kernel: /var/lib/wrenn/kernels/vmlinux
  • Base rootfs images: /var/lib/wrenn/images/{template}.ext4
  • Sandbox clones: /var/lib/wrenn/sandboxes/
  • Firecracker: /usr/local/bin/firecracker (e2b's fork of firecracker)

Web UI Styling

Wrenn brand: Warm earthy developer tool with crafted organic character.

Color palette (light/dark): Background scale: #f8f6f1 → #f1eeea → #e8e5e0 → #dedbd5 (light); #090b0a → #0f1211 → #151918 → #1b201e → #222826 (dark). Text hierarchy: bright #2c2a26 / body #4a4740 / dim #7a766e / faint #a09b93 (light); #e8e5df / #c8c4bc / #8a867f / #5f5c57 (dark). Sage green brand accent: #5e8c58 (light) / #89a785 (dark), with glow variant rgba(94,140,88,0.08). Borders: #e2dfd9 (light) / #262c2a (dark). Semantic status colors: amber #9e7c2e (warning/building), red #b35544 (error/failed), blue #3d7aac (info/stopped) — each with a color-dim transparent bg variant for badge backgrounds. Destructive: #b35544 light / #c27b6d dark.

Typography: Four fonts. Manrope (variable, weights 300700) for all UI labels, nav, body. Instrument Serif (400) for page titles, empty-state headings, large metric values. JetBrains Mono (400/500) for code, env var keys/values, deployment IDs, commit SHAs, log viewer, URL paths. Alice for the sidebar wordmark only. Base body size 14px. Headings: h1 24px serif, h2 20px, h3 18px, h4h6 11px sans-serif uppercase wide-tracked. Metric card values 34px serif at letter-spacing: -0.08em. Section labels at 0.060.07em tracking, weight 550600. Spacing: 4px base unit (Tailwind scale). Page content p-8 (32px). Cards p-4p-5. Sidebar nav items 7px 10px. Consistent, moderate density — functional but not cramped.

Borders & depth: Flat aesthetic — --shadow-sm: 0 0 #0000, no drop shadows. Depth is achieved through background color stepping (bg → bg-3 → bg-4 → bg-5), not shadows. Borders 1px solid in warm muted tones. Corner radii: cards/surfaces 12px, inputs/small buttons 68px, avatars 8px, dots 50%.

Components: Active sidebar nav items use a 3px left-border in sage green rather than filled backgrounds, with a sage glow bg (rgba(94,140,88,0.08)). Focus rings are double-ring: 0 0 0 2px background, 0 0 0 4px ring. Status system has four states (Live/sage, Building/amber+pulse, Failed/red, Stopped/faint) each with solid dot + transparent-bg badge pair. Buttons follow ghost → outline → filled hierarchy. Tables wrapped in rounded-xl border. Dialogs via native

. Toasts bottom-anchored.

Animation: Crisp 150ms transitions on all interactive elements. Sidebar width 250ms ease. Custom wrenn-pulse keyframe (2.5s ease infinite box-shadow bloom) on live/building status dots. Top-of-page loading bar (h-0.5, sage green) on navigation.

Dark mode: Full support. Very dark near-black-green backgrounds with warm off-white text and desaturated sage accent. Flat (no card shadows). System preference detection + localStorage persistence.

Overall feel: Warm, earthy, semi-flat. Avoids cold grays entirely — palette leans slightly warm/brown-tinted throughout. The serif + mono + geometric sans type stack gives a designed but unfussy developer-tool character. Organic and considered, not sterile.