diff --git a/CLAUDE.md b/CLAUDE.md index c265f8e..e1f0991 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,848 +1,234 @@ -# CLAUDE.md — Wrenn Sandbox +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview -Wrenn Sandbox is a microVM-based code execution platform. Users create isolated sandboxes (Firecracker microVMs), run code inside them, and get output back via SDKs (Python, TypeScript, Go). Think E2B but with persistent sandboxes, pool-based pricing, and a single-binary deployment story. +Wrenn Sandbox is a microVM-based code execution platform. Users create isolated sandboxes (Firecracker microVMs), run code inside them, and get output back via SDKs. Think E2B but with persistent sandboxes, pool-based pricing, and a single-binary deployment story. + +## Build & Development Commands + +All commands go through the Makefile. Never use raw `go build` or `go run`. + +```bash +make build # Build all binaries → builds/ +make build-cp # Control plane only +make build-agent # Host agent only +make build-envd # envd static binary (verified statically linked) + +make dev # Full local dev: infra + migrate + control plane +make dev-infra # Start PostgreSQL + Prometheus + Grafana (Docker) +make dev-down # Stop dev infra +make dev-cp # Control plane with hot reload (if air installed) +make dev-agent # Host agent (sudo required) +make dev-envd # envd in TCP debug mode + +make check # fmt + vet + lint + test (CI order) +make test # Unit tests: go test -race -v ./internal/... +make test-integration # Integration tests (require host agent + Firecracker) +make fmt # gofmt both modules +make vet # go vet both modules +make lint # golangci-lint + +make migrate-up # Apply pending migrations +make migrate-down # Rollback last migration +make migrate-create name=xxx # Scaffold new goose migration (never create manually) +make migrate-reset # Drop + re-apply all + +make generate # Proto (buf) + sqlc codegen +make proto # buf generate for all proto dirs +make tidy # go mod tidy both modules +``` + +Run a single test: `go test -race -v -run TestName ./internal/path/...` ## Architecture ``` -User SDK (Python/TS/Go) - │ - │ HTTPS / WebSocket - ▼ -Control Plane (Go binary, single process) - ├── REST API (chi router) - ├── Admin UI (htmx + Go templates) - ├── Scheduler (picks host for new sandboxes) - ├── State DB (PostgreSQL via pgx + goose migrations) - ├── Lifecycle Manager (background goroutine) - └── gRPC client → Host Agent - │ - │ gRPC (mTLS) - ▼ -Host Agent (Go binary, one per physical machine) - ├── VM Manager (Firecracker HTTP API via Unix socket) - ├── Network Manager (TAP devices, NAT, IP allocator) - ├── Filesystem Manager (CoW rootfs clones) - ├── Envd Client (HTTP/Connect RPC to guest agent via TAP network) - ├── Snapshot Manager (pause/hibernate/resume) - ├── Metrics Exporter (Prometheus) - └── gRPC server (listens for control plane) - │ - │ HTTP over TAP network (veth + namespace isolation) - ▼ -envd (Go binary, runs inside each microVM via wrenn-init) - ├── ProcessService (exec commands, stream stdout/stderr) - ├── FilesystemService (read/write/list files) - └── Terminal (PTY handling for interactive sessions) +User SDK → HTTPS/WS → Control Plane → Connect RPC → Host Agent → HTTP/Connect RPC over TAP → envd (inside VM) ``` -## Key Decisions +**Three binaries, two Go modules:** -- **Language**: Everything is Go. No Python, no Node.js, no separate frontend. -- **Guest agent**: envd is extracted from E2B's open-source repo (e2b-dev/infra, Apache 2.0). The orchestrator VM management code is also adapted from E2B. -- **Database**: PostgreSQL. Migrations via goose (plain SQL files). -- **Admin UI**: htmx + Go html/template + chi router, served from the control plane binary. No SPA, no React, no build step. -- **API framework**: chi router for HTTP. Standard grpc-go for gRPC. -- **Billing**: Lago (external service, integrated via API). Not part of this codebase — we send usage events to Lago. -- **No separate reverse proxy binary**. Port forwarding is handled within the control plane or host agent directly if needed later. +| Binary | Module | Entry point | Runs as | +|--------|--------|-------------|---------| +| wrenn-cp | `git.omukk.dev/wrenn/sandbox` | `cmd/control-plane/main.go` | Unprivileged | +| wrenn-agent | `git.omukk.dev/wrenn/sandbox` | `cmd/host-agent/main.go` | Root (NET_ADMIN + /dev/kvm) | +| envd | `git.omukk.dev/wrenn/sandbox/envd` (standalone `envd/go.mod`) | `envd/main.go` | PID 1 inside guest VM | -## Directory Structure +envd is a **completely independent Go module**. It is never imported by the main module. The only connection is the protobuf contract. It compiles to a static binary baked into rootfs images. + +**Key architectural invariant:** The host agent is **stateful** (in-memory `boxes` map is the source of truth for running VMs). The control plane is **stateless** (all persistent state in PostgreSQL). The reconciler (`internal/api/reconciler.go`) bridges the gap — it periodically compares DB records against the host agent's live state and marks orphaned sandboxes as "stopped". + +### Control Plane + +**Packages:** `internal/api/`, `internal/admin/`, `internal/auth/`, `internal/scheduler/`, `internal/lifecycle/`, `internal/config/`, `internal/db/` + +Startup (`cmd/control-plane/main.go`) wires: config (env vars) → pgxpool → `db.Queries` (sqlc-generated) → Connect RPC client to host agent → `api.Server`. Everything flows through constructor injection. + +- **API Server** (`internal/api/server.go`): chi router with middleware. Creates handler structs (`sandboxHandler`, `execHandler`, `filesHandler`, etc.) injected with `db.Queries` and the host agent Connect RPC client. Routes under `/v1/sandboxes/*`. +- **Reconciler** (`internal/api/reconciler.go`): background goroutine (every 30s) that compares DB records against `agent.ListSandboxes()` RPC. Marks orphaned DB entries as "stopped". +- **Admin UI** at `/admin/` (htmx + Go html/template, no SPA, no build step) +- **Database**: PostgreSQL via pgx/v5. Queries generated by sqlc from `db/queries/sandboxes.sql`. Migrations in `db/migrations/` (goose, plain SQL). +- **Config** (`internal/config/config.go`): purely environment variables (`DATABASE_URL`, `CP_LISTEN_ADDR`, `CP_HOST_AGENT_ADDR`), no YAML/file config. + +### Host Agent + +**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/filesystem/`, `internal/envdclient/`, `internal/snapshot/` + +Startup (`cmd/host-agent/main.go`) wires: root check → enable IP forwarding → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator`) → `hostagent.Server` (Connect RPC handler) → HTTP server. + +- **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return. +- **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes. +- **VM Manager** (`internal/vm/manager.go`, `fc.go`, `config.go`): manages Firecracker processes. Uses raw HTTP API over Unix socket (`/tmp/fc-{sandboxID}.sock`), not the firecracker-go-sdk Machine type. Launches Firecracker via `unshare -m` + `ip netns exec`. Configures VM via PUT to `/boot-source`, `/drives/rootfs`, `/network-interfaces/eth0`, `/machine-config`, then starts with PUT `/actions`. +- **Network** (`internal/network/setup.go`, `allocator.go`): per-sandbox network namespace with veth pair + TAP device. See Networking section below. +- **Filesystem** (`internal/filesystem/clone.go`): CoW rootfs clones via `cp --reflink=auto`. +- **envd Client** (`internal/envdclient/client.go`, `health.go`): dual interface to the guest agent. Connect RPC for streaming process exec (`process.Start()` bidirectional stream). Plain HTTP for file operations (POST/GET `/files?path=...&username=root`). Health check polls `GET /health` every 100ms until ready (30s timeout). + +### envd (Guest Agent) + +**Module:** `envd/` with its own `go.mod` (`git.omukk.dev/wrenn/sandbox/envd`) + +Runs as PID 1 inside the microVM via `wrenn-init.sh` (mounts procfs/sysfs/dev, sets hostname, writes resolv.conf, then execs envd). Extracted from E2B (Apache 2.0), with shared packages internalized into `envd/internal/shared/`. Listens on TCP `0.0.0.0:49983`. + +- **ProcessService**: start processes, stream stdout/stderr, signal handling, PTY support +- **FilesystemService**: stat/list/mkdir/move/remove/watch files +- **Health**: GET `/health` + +### Networking (per sandbox) + +Each sandbox gets its own Linux network namespace (`ns-{idx}`). Slot index (1-based, up to 65534) determines all addressing: ``` -wrenn-sandbox/ -├── CLAUDE.md # This file -├── Makefile # Build all binaries, run migrations, generate proto -├── go.mod # github.com/wrenn-dev/wrenn-sandbox -├── go.sum -├── .env.example -│ -├── cmd/ -│ ├── control-plane/ -│ │ └── main.go # Entry: HTTP server + gRPC client + lifecycle manager -│ └── host-agent/ -│ └── main.go # Entry: gRPC server + VM management -│ -├── envd/ # Guest agent (extracted from E2B, separate go.mod) -│ ├── go.mod -│ ├── main.go -│ ├── Makefile -│ └── internal/ # Process exec, filesystem, PTY, PID 1 handling -│ -├── proto/ -│ ├── envd/ # From E2B: ProcessService, FilesystemService -│ │ ├── process.proto -│ │ ├── filesystem.proto -│ │ └── gen/ # Generated Go stubs -│ └── hostagent/ # Our definition: control plane ↔ host agent -│ ├── hostagent.proto -│ └── gen/ -│ -├── internal/ -│ │ -│ │ ── CONTROL PLANE ── -│ ├── api/ -│ │ ├── server.go # chi router setup, middleware -│ │ ├── middleware.go # Auth, rate limiting, request logging -│ │ ├── handlers_sandbox.go # CRUD for sandboxes -│ │ ├── handlers_exec.go # Execute commands in sandboxes -│ │ ├── handlers_files.go # Upload/download files -│ │ └── handlers_terminal.go # WebSocket terminal sessions -│ │ -│ ├── admin/ # Admin UI (htmx + Go templates) -│ │ ├── handlers.go # Page handlers (dashboard, sandbox detail, etc.) -│ │ ├── templates/ -│ │ │ ├── layout.html # Base layout with htmx, navigation -│ │ │ ├── dashboard.html # Overview: active sandboxes, resource usage -│ │ │ ├── sandboxes.html # List all sandboxes with status -│ │ │ ├── sandbox_detail.html # Single sandbox: logs, metrics, audit trail -│ │ │ └── partials/ # htmx partial templates for dynamic updates -│ │ │ ├── sandbox_row.html -│ │ │ ├── metrics_card.html -│ │ │ └── audit_log.html -│ │ └── static/ # Minimal CSS (no build step) -│ │ └── style.css -│ │ -│ ├── auth/ -│ │ ├── apikey.go # API key validation -│ │ └── ratelimit.go -│ │ -│ ├── scheduler/ -│ │ ├── scheduler.go # Interface definition -│ │ ├── single_host.go # Default: always picks the one registered host -│ │ └── least_loaded.go # Multi-host: picks host with most available capacity -│ │ -│ ├── lifecycle/ -│ │ └── manager.go # Background goroutine: auto-pause, auto-hibernate, auto-destroy -│ │ -│ │ ── HOST AGENT ── -│ ├── vm/ -│ │ ├── manager.go # CreateVM, DestroyVM (wraps Firecracker Go SDK) -│ │ ├── config.go # Build Firecracker config from sandbox request -│ │ └── jailer.go # Jailer configuration for production -│ │ -│ ├── network/ -│ │ ├── manager.go # SetupNetwork, TeardownNetwork (TAP + NAT) -│ │ ├── allocator.go # IP pool allocator (/30 subnets from 10.0.0.0/16) -│ │ └── nat.go # iptables/nftables rule management -│ │ -│ ├── filesystem/ -│ │ ├── images.go # Base image registry -│ │ └── clone.go # CoW rootfs clones (cp --reflink) -│ │ -│ ├── envdclient/ -│ │ ├── client.go # gRPC client wrapper for envd -│ │ ├── dialer.go # HTTP transport to envd via TAP network -│ │ └── health.go # Health check with retry -│ │ -│ ├── snapshot/ -│ │ ├── manager.go # Pause/resume coordination -│ │ ├── local.go # Local disk snapshot storage -│ │ └── remote.go # S3/GCS upload/download for hibernate -│ │ -│ ├── metrics/ -│ │ ├── collector.go # Read cgroup stats per sandbox -│ │ └── exporter.go # Prometheus /metrics endpoint -│ │ -│ │ ── SHARED ── -│ ├── models/ -│ │ ├── sandbox.go # Sandbox struct, status enum, state machine -│ │ └── host.go # Host struct, capacity tracking -│ │ -│ ├── id/ -│ │ └── id.go # Generate sandbox IDs: "sb-" + 8 hex chars -│ │ -│ └── config/ -│ └── config.go # Configuration loading (env vars + YAML) -│ -├── db/ -│ ├── migrations/ # goose SQL migrations -│ │ ├── 00001_initial.sql -│ │ ├── 00002_add_persistence.sql -│ │ └── 00003_add_audit_events.sql -│ └── queries/ # SQL queries (used with sqlc or raw pgx) -│ ├── sandboxes.sql -│ ├── hosts.sql -│ └── audit.sql -│ -├── images/ # Rootfs build scripts -│ ├── build-rootfs.sh -│ ├── docker-to-rootfs.sh -│ └── templates/ -│ ├── minimal/build.sh -│ ├── python311/build.sh -│ └── node20/build.sh -| -├── deploy/ -│ ├── systemd/ -│ │ ├── wrenn-control-plane.service -│ │ └── wrenn-host-agent.service -│ └── ansible/ -│ └── playbook.yml -│ -├── scripts/ -│ ├── setup-host.sh -│ ├── generate-proto.sh -│ └── dev.sh -│ -└── tests/ - ├── integration/ - │ ├── sandbox_lifecycle_test.go - │ ├── networking_test.go - │ └── snapshot_test.go - └── load/ - └── concurrent_test.go +Host Namespace Namespace "ns-{idx}" Guest VM +────────────────────────────────────────────────────────────────────────────────────── +veth-{idx} ←──── veth pair ────→ eth0 +10.12.0.{idx*2}/31 10.12.0.{idx*2+1}/31 + │ + tap0 (169.254.0.22/30) ←── TAP ──→ eth0 (169.254.0.21) + ↑ kernel ip= boot arg ``` -## Database - -### Tech Stack -- PostgreSQL (via pgx/v5 driver, no ORM) -- goose for migrations (plain SQL, up/down) -- sqlc for type-safe query generation (optional, can use raw pgx) - -### Migration Convention -``` -db/migrations/ -├── 00001_initial.sql -├── 00002_add_persistence.sql -└── ... -``` - -Each migration file uses goose format: -```sql --- +goose Up -CREATE TABLE sandboxes (...); - --- +goose Down -DROP TABLE sandboxes; -``` - -Run migrations: -```bash -# Apply all pending migrations -goose -dir db/migrations postgres "$DATABASE_URL" up - -# Rollback last migration -goose -dir db/migrations postgres "$DATABASE_URL" down - -# Check current status -goose -dir db/migrations postgres "$DATABASE_URL" status - -# Create a new migration -goose -dir db/migrations create add_new_table sql -``` - -### Core Tables - -**sandboxes** — Every sandbox created on the platform -```sql -CREATE TABLE sandboxes ( - id TEXT PRIMARY KEY, - owner_id TEXT NOT NULL, - host_id TEXT NOT NULL, - template TEXT NOT NULL, - status TEXT NOT NULL DEFAULT 'pending', - vcpus INTEGER DEFAULT 1, - memory_mb INTEGER DEFAULT 512, - timeout_sec INTEGER DEFAULT 300, - guest_ip TEXT, - vsock_cid INTEGER, - snapshot_path TEXT, - created_at TIMESTAMPTZ DEFAULT NOW(), - started_at TIMESTAMPTZ, - paused_at TIMESTAMPTZ, - last_active_at TIMESTAMPTZ, - metadata JSONB DEFAULT '{}' -); - -CREATE INDEX idx_sandboxes_owner ON sandboxes(owner_id); -CREATE INDEX idx_sandboxes_status ON sandboxes(status); -CREATE INDEX idx_sandboxes_host ON sandboxes(host_id); -``` - -**hosts** — Registered host agents -```sql -CREATE TABLE hosts ( - id TEXT PRIMARY KEY, - grpc_endpoint TEXT NOT NULL, - total_vcpus INTEGER, - total_memory_mb INTEGER, - used_vcpus INTEGER DEFAULT 0, - used_memory_mb INTEGER DEFAULT 0, - sandbox_count INTEGER DEFAULT 0, - status TEXT DEFAULT 'healthy', - last_heartbeat TIMESTAMPTZ -); -``` - -**audit_events** — Every exec/file operation -```sql -CREATE TABLE audit_events ( - id BIGSERIAL PRIMARY KEY, - sandbox_id TEXT NOT NULL REFERENCES sandboxes(id), - owner_id TEXT NOT NULL, - event_type TEXT NOT NULL, - command TEXT, - exit_code INTEGER, - duration_ms INTEGER, - stdout_bytes INTEGER, - stderr_bytes INTEGER, - created_at TIMESTAMPTZ DEFAULT NOW() -); - -CREATE INDEX idx_audit_sandbox ON audit_events(sandbox_id); -CREATE INDEX idx_audit_owner ON audit_events(owner_id); -CREATE INDEX idx_audit_created ON audit_events(created_at); -``` - -**api_keys** — Authentication -```sql -CREATE TABLE api_keys ( - id TEXT PRIMARY KEY, - key_hash TEXT NOT NULL UNIQUE, - owner_id TEXT NOT NULL, - plan TEXT DEFAULT 'hobby', - pool_vcpus INTEGER DEFAULT 2, - pool_memory_mb INTEGER DEFAULT 8192, - pool_storage_mb INTEGER DEFAULT 20480, - is_active BOOLEAN DEFAULT true, - created_at TIMESTAMPTZ DEFAULT NOW() -); -``` +- **Host-reachable IP**: `10.11.0.{idx}/32` — routed through veth to namespace, DNAT'd to guest +- **Outbound NAT**: guest (169.254.0.21) → SNAT to vpeerIP inside namespace → MASQUERADE on host to default interface +- **Inbound NAT**: host traffic to 10.11.0.{idx} → DNAT to 169.254.0.21 inside namespace +- IP forwarding enabled inside each namespace +- All details in `internal/network/setup.go` ### Sandbox State Machine ``` PENDING → STARTING → RUNNING → PAUSED → HIBERNATED │ │ ↓ ↓ - STOPPED STOPPED - │ - ↓ - (destroyed/cleaned up) + STOPPED STOPPED → (destroyed) -Also: any state → ERROR (on crash/failure) -PAUSED → RUNNING (resume from warm snapshot) -HIBERNATED → RUNNING (resume from cold snapshot, slower) +Any state → ERROR (on crash/failure) +PAUSED → RUNNING (warm snapshot resume) +HIBERNATED → RUNNING (cold snapshot resume, slower) ``` -## Admin UI (htmx) +### Key Request Flows -The control plane serves an admin dashboard at `/admin/`. It uses: -- Go `html/template` for server-side rendering -- htmx for dynamic updates (no JavaScript framework) -- Minimal custom CSS — no Tailwind, no build step +**Sandbox creation** (`POST /v1/sandboxes`): +1. API handler generates sandbox ID, inserts into DB as "pending" +2. RPC `CreateSandbox` → host agent → `sandbox.Manager.Create()` +3. Manager: resolve base rootfs → `cp --reflink` clone → allocate network slot → `CreateNetwork()` (netns + veth + tap + NAT) → `vm.Create()` (start Firecracker, configure via HTTP API, boot) → `envdclient.WaitUntilReady()` (poll /health) → store in-memory state +4. API handler updates DB to "running" with host_ip -### Pages -- `/admin/` — Dashboard: active sandbox count, resource pool usage, recent activity -- `/admin/sandboxes` — List all sandboxes (filterable by status, owner, template) -- `/admin/sandboxes/{id}` — Sandbox detail: status, metrics, audit log, actions (pause/resume/destroy) -- `/admin/hosts` — Host agent list with capacity and health -- `/admin/keys` — API key management +**Command execution** (`POST /v1/sandboxes/{id}/exec`): +1. API handler verifies sandbox is "running" in DB +2. RPC `Exec` → host agent → `sandbox.Manager.Exec()` → `envdclient.Exec()` +3. envd client opens bidirectional Connect RPC stream (`process.Start`), collects stdout/stderr/exit_code +4. API handler checks UTF-8 validity (base64-encodes if binary), updates last_active_at, returns result -### htmx Patterns -- Sandbox list auto-refreshes via `hx-trigger="every 5s"` -- Actions (pause, resume, destroy) use `hx-post` with `hx-swap="outerHTML"` to update the row -- Audit log on sandbox detail uses `hx-get` with infinite scroll -- Metrics cards use `hx-trigger="every 10s"` for live updates +**Streaming exec** (`WS /v1/sandboxes/{id}/exec/stream`): +1. WebSocket upgrade, read first message for cmd/args +2. RPC `ExecStream` → host agent → `sandbox.Manager.ExecStream()` → `envdclient.ExecStream()` +3. envd client returns a channel of events; host agent forwards events through the RPC stream +4. API handler forwards stream events to WebSocket as JSON messages (`{type: "stdout"|"stderr"|"exit", ...}`) -### Styling -Wrenn brand colors: -- Background: obsidian (#0c0c0c, #131313, #1a1a1a for raised surfaces) -- Text: warm off-white (#e8e6e3), dim (#9a9890) -- Accent: sage green (#8fbc8f) -- Borders: #2a2a2a -- Font: system monospace for data, system sans-serif for prose -- Minimal, developer-tool aesthetic. Dense, functional, sharp edges. - -## Proto Definitions - -### hostagent.proto (control plane ↔ host agent) -```protobuf -syntax = "proto3"; -package hostagent; -option go_package = "github.com/wrenn-dev/wrenn-sandbox/proto/hostagent/gen"; - -service HostAgentService { - rpc CreateSandbox(CreateSandboxRequest) returns (CreateSandboxResponse); - rpc DestroySandbox(DestroySandboxRequest) returns (DestroySandboxResponse); - rpc PauseSandbox(PauseSandboxRequest) returns (PauseSandboxResponse); - rpc ResumeSandbox(ResumeSandboxRequest) returns (ResumeSandboxResponse); - rpc Exec(ExecRequest) returns (stream ExecOutput); - rpc WriteFile(WriteFileRequest) returns (WriteFileResponse); - rpc ReadFile(ReadFileRequest) returns (ReadFileResponse); - rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse); -} -``` - -### envd protos (host agent ↔ guest agent) -Extracted from E2B's spec/ directory. ProcessService and FilesystemService. Do not modify these unless you also modify envd. +**File transfer**: Write uses multipart POST to envd `/files`; read uses GET. Streaming variants chunk in 64KB pieces through the RPC stream. ## REST API -All endpoints under `/v1/`. JSON request/response. API key auth via `X-API-Key` header. +Routes defined in `internal/api/server.go`, handlers in `internal/api/handlers_*.go`. OpenAPI spec embedded via `//go:embed` and served at `/openapi.yaml` (Swagger UI at `/docs`). JSON request/response. API key auth via `X-API-Key` header. Error responses: `{"error": {"code": "...", "message": "..."}}`. -``` -POST /v1/sandboxes Create sandbox -GET /v1/sandboxes List sandboxes -GET /v1/sandboxes/{id} Get sandbox status -POST /v1/sandboxes/{id}/exec Execute command -PUT /v1/sandboxes/{id}/files Upload file -GET /v1/sandboxes/{id}/files/{path} Download file -POST /v1/sandboxes/{id}/pause Pause sandbox -POST /v1/sandboxes/{id}/resume Resume sandbox -DELETE /v1/sandboxes/{id} Destroy sandbox -WS /v1/sandboxes/{id}/terminal Interactive terminal +## Code Generation -GET /v1/hosts List hosts (admin) -GET /v1/keys List API keys (admin) -POST /v1/keys Create API key (admin) -``` +### Proto (Connect RPC) + +Proto source of truth is `proto/envd/*.proto` and `proto/hostagent/*.proto`. Run `make proto` to regenerate. Three `buf.gen.yaml` files control output: + +| buf.gen.yaml location | Generates to | Used by | +|---|---|---| +| `proto/envd/buf.gen.yaml` | `proto/envd/gen/` | Main module (host agent's envd client) | +| `proto/hostagent/buf.gen.yaml` | `proto/hostagent/gen/` | Main module (control plane ↔ host agent) | +| `envd/spec/buf.gen.yaml` | `envd/internal/services/spec/` | envd module (guest agent server) | + +The envd `buf.gen.yaml` reads from `../../proto/envd/` (same source protos) but generates into envd's own module. This means the same `.proto` files produce two independent sets of Go stubs — one for each Go module. + +To add a new RPC method: edit the `.proto` file → `make proto` → implement the handler on both sides. + +### sqlc + +Config: `sqlc.yaml` (project root). Reads queries from `db/queries/*.sql`, reads schema from `db/migrations/`, outputs to `internal/db/`. + +To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `make generate` → use the new method on `*db.Queries`. + +## Key Technical Decisions + +- **Connect RPC** (not gRPC) for all RPC communication between components +- **Buf + protoc-gen-connect-go** for code generation (not protoc-gen-go-grpc) +- **Raw Firecracker HTTP API** via Unix socket (not firecracker-go-sdk Machine type) +- **TAP networking** (not vsock) for host-to-envd communication +- **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down) +- **Admin UI**: htmx + Go html/template + chi router. No SPA, no React, no build step +- **Lago** for billing (external service, not in this codebase) ## Coding Conventions -### Go Style -- Follow standard Go conventions. Run `gofmt` and `go vet`. -- Use `context.Context` everywhere. Pass it through the full call chain. -- Error handling: wrap errors with `fmt.Errorf("create sandbox: %w", err)`. No bare returns. -- Logging: use `slog` (Go 1.21+ structured logging). No third-party loggers. -- No global state. Everything injected via constructors. +- **Go style**: `gofmt`, `go vet`, `context.Context` everywhere, errors wrapped with `fmt.Errorf("action: %w", err)`, `slog` for logging, no global state +- **Naming**: Sandbox IDs `sb-` + 8 hex, API keys `wrn_` + 32 chars, Host IDs `host-` + 8 hex +- **Dependencies**: Use `go get` to add deps, never hand-edit go.mod. For envd deps: `cd envd && go get ...` (separate module) +- **Generated code**: Always commit generated code (proto stubs, sqlc). Never add generated code to .gitignore +- **Migrations**: Always use `make migrate-create name=xxx`, never create migration files manually +- **Testing**: Table-driven tests for handlers and state machine transitions -### Naming -- Sandbox IDs: `sb-` prefix + 8 hex chars (e.g., `sb-a1b2c3d4`) -- API keys: `wrn_` prefix + 32 random chars -- Host IDs: hostname or `host-` prefix + 8 hex chars -- TAP devices: `tap-` + first 8 chars of sandbox ID -- Network slot index: 1-based, determines all per-sandbox IPs +### Two-module gotcha -### Error Responses -```json -{ - "error": { - "code": "pool_exhausted", - "message": "Your vCPU pool is fully allocated. Upgrade your plan or destroy idle sandboxes." - } -} -``` +The main module (`go.mod`) and envd (`envd/go.mod`) are fully independent. `make tidy`, `make fmt`, `make vet` already operate on both. But when adding dependencies manually, remember to target the correct module (`cd envd && go get ...` for envd deps). `make proto` also generates stubs for both modules from the same proto sources. -### Testing -- Unit tests: `go test ./internal/...` -- Integration tests: `go test ./tests/integration/...` (require running host agent + Firecracker) -- Table-driven tests for handlers and state machine transitions +## Rootfs & Guest Init +- **wrenn-init** (`images/wrenn-init.sh`): the PID 1 init script baked into every rootfs. Mounts virtual filesystems, sets hostname, writes `/etc/resolv.conf`, then execs envd. +- **Updating the rootfs** after changing envd or wrenn-init: `bash scripts/update-debug-rootfs.sh [rootfs_path]`. This builds envd via `make build-envd`, mounts the rootfs image, copies in the new binaries, and unmounts. Defaults to `/var/lib/wrenn/images/minimal.ext4`. +- Rootfs images are minimal debootstrap — no systemd, no coreutils beyond busybox. Use `/bin/sh -c` for shell builtins inside the guest. -## envd — Standalone Binary +## Fixed Paths (on host machine) -envd is a **completely independent Go project**. It has its own `go.mod`, its own dependencies, and its own build. It is never imported by the control plane or host agent as a Go package. The only connection is the protobuf contract — both envd and the host agent generate code from the same `.proto` files. +- Kernel: `/var/lib/wrenn/kernels/vmlinux` +- Base rootfs images: `/var/lib/wrenn/images/{template}.ext4` +- Sandbox clones: `/var/lib/wrenn/sandboxes/` +- Firecracker: `/usr/local/bin/firecracker` -**Why standalone:** envd runs inside microVMs. It gets compiled once as a static binary, baked into rootfs images, and then used across thousands of sandboxes. It has zero runtime dependency on the rest of the Wrenn codebase. The host agent talks to it over HTTP/Connect RPC via TAP networking — same as talking to any remote service. +## Web UI Styling -**envd's own structure:** -``` -envd/ -├── go.mod # module github.com/wrenn-dev/envd (NOT the parent module) -├── go.sum -├── Makefile # self-contained build -├── main.go # Entry point, boots as PID 1 -└── internal/ - ├── server/ # gRPC service implementations - ├── process/ # Process exec, PTY, signal handling - ├── filesystem/ # File read/write/list/watch - └── network/ # Guest-side network config on boot/resume -``` +**Wrenn brand:** +Warm earthy developer tool with crafted organic character. -**Building envd:** -```bash -cd envd -CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags='-s -w' -o envd . -file envd # MUST say "statically linked" -# Binary goes into rootfs images at /usr/local/bin/envd -``` +**Color palette (light/dark):** +Background scale: #f8f6f1 → #f1eeea → #e8e5e0 → #dedbd5 (light); #090b0a → #0f1211 → #151918 → #1b201e → #222826 (dark). Text hierarchy: bright #2c2a26 / body #4a4740 / dim #7a766e / faint #a09b93 (light); #e8e5df / #c8c4bc / #8a867f / #5f5c57 (dark). Sage green brand accent: #5e8c58 (light) / #89a785 (dark), with glow variant rgba(94,140,88,0.08). Borders: #e2dfd9 (light) / #262c2a (dark). Semantic status colors: amber #9e7c2e (warning/building), red #b35544 (error/failed), blue #3d7aac (info/stopped) — each with a color-dim transparent bg variant for badge backgrounds. Destructive: #b35544 light / #c27b6d dark. -**Versioning:** envd has its own version, independent of the control plane or host agent. When you update envd, you rebuild rootfs images. Existing sandboxes keep the old envd. +**Typography:** +Four fonts. Manrope (variable, weights 300–700) for all UI labels, nav, body. Instrument Serif (400) for page titles, empty-state headings, large metric values. JetBrains Mono (400/500) for code, env var keys/values, deployment IDs, commit SHAs, log viewer, URL paths. Alice for the sidebar wordmark only. Base body size 14px. Headings: h1 24px serif, h2 20px, h3 18px, h4–h6 11px sans-serif uppercase wide-tracked. Metric card values 34px serif at letter-spacing: -0.08em. Section labels at 0.06–0.07em tracking, weight 550–600. +Spacing: 4px base unit (Tailwind scale). Page content p-8 (32px). Cards p-4–p-5. Sidebar nav items 7px 10px. Consistent, moderate density — functional but not cramped. -## Build Commands (Makefile) +**Borders & depth:** Flat aesthetic — --shadow-sm: 0 0 #0000, no drop shadows. Depth is achieved through background color stepping (bg → bg-3 → bg-4 → bg-5), not shadows. Borders 1px solid in warm muted tones. Corner radii: cards/surfaces 12px, inputs/small buttons 6–8px, avatars 8px, dots 50%. -```makefile -# ═══════════════════════════════════════════════════ -# Variables -# ═══════════════════════════════════════════════════ -DATABASE_URL ?= postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable -GOBIN := $(shell pwd)/bin -ENVD_DIR := envd -LDFLAGS := -s -w +**Components:** Active sidebar nav items use a 3px left-border in sage green rather than filled backgrounds, with a sage glow bg (rgba(94,140,88,0.08)). Focus rings are double-ring: 0 0 0 2px background, 0 0 0 4px ring. Status system has four states (Live/sage, Building/amber+pulse, Failed/red, Stopped/faint) each with solid dot + transparent-bg badge pair. Buttons follow ghost → outline → filled hierarchy. Tables wrapped in rounded-xl border. Dialogs via native . Toasts bottom-anchored. -# ═══════════════════════════════════════════════════ -# Build -# ═══════════════════════════════════════════════════ -.PHONY: build build-cp build-agent build-envd +**Animation:** Crisp 150ms transitions on all interactive elements. Sidebar width 250ms ease. Custom wrenn-pulse keyframe (2.5s ease infinite box-shadow bloom) on live/building status dots. Top-of-page loading bar (h-0.5, sage green) on navigation. -build: build-cp build-agent build-envd +**Dark mode:** Full support. Very dark near-black-green backgrounds with warm off-white text and desaturated sage accent. Flat (no card shadows). System preference detection + localStorage persistence. -build-cp: - go build -v -ldflags="$(LDFLAGS)" -o $(GOBIN)/wrenn-cp ./cmd/control-plane - -build-agent: - go build -v -ldflags="$(LDFLAGS)" -o $(GOBIN)/wrenn-agent ./cmd/host-agent - -build-envd: - cd $(ENVD_DIR) && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \ - go build -ldflags="$(LDFLAGS)" -o ../$(GOBIN)/envd . - @file $(GOBIN)/envd | grep -q "statically linked" || \ - (echo "ERROR: envd is not statically linked!" && exit 1) - -# ═══════════════════════════════════════════════════ -# Development -# ═══════════════════════════════════════════════════ -.PHONY: dev dev-cp dev-agent dev-envd dev-infra dev-down dev-seed - -## One command to start everything for local dev -dev: dev-infra migrate-up dev-seed dev-cp - -dev-infra: - docker compose -f deploy/docker-compose.dev.yml up -d - @echo "Waiting for PostgreSQL..." - @until pg_isready -h localhost -p 5432 -q; do sleep 0.5; done - @echo "Dev infrastructure ready." - -dev-down: - docker compose -f deploy/docker-compose.dev.yml down -v - -dev-cp: - @if command -v air > /dev/null; then air -c .air.cp.toml; \ - else go run ./cmd/control-plane; fi - -dev-agent: - sudo go run ./cmd/host-agent - -dev-envd: - cd $(ENVD_DIR) && go run . --debug --listen-tcp :3002 - -dev-seed: - go run ./scripts/seed.go - -# ═══════════════════════════════════════════════════ -# Database (goose) -# ═══════════════════════════════════════════════════ -.PHONY: migrate-up migrate-down migrate-status migrate-create migrate-reset - -migrate-up: - goose -dir db/migrations postgres "$(DATABASE_URL)" up - -migrate-down: - goose -dir db/migrations postgres "$(DATABASE_URL)" down - -migrate-status: - goose -dir db/migrations postgres "$(DATABASE_URL)" status - -migrate-create: - goose -dir db/migrations create $(name) sql - -migrate-reset: - goose -dir db/migrations postgres "$(DATABASE_URL)" reset - goose -dir db/migrations postgres "$(DATABASE_URL)" up - -# ═══════════════════════════════════════════════════ -# Code Generation -# ═══════════════════════════════════════════════════ -.PHONY: generate proto sqlc - -generate: proto sqlc - -proto: - protoc --go_out=. --go_opt=paths=source_relative \ - --go-grpc_out=. --go-grpc_opt=paths=source_relative \ - proto/hostagent/hostagent.proto - protoc --go_out=. --go_opt=paths=source_relative \ - --go-grpc_out=. --go-grpc_opt=paths=source_relative \ - proto/envd/process.proto proto/envd/filesystem.proto - -sqlc: - @if command -v sqlc > /dev/null; then sqlc generate; \ - else echo "sqlc not installed, skipping"; fi - -# ═══════════════════════════════════════════════════ -# Quality & Testing -# ═══════════════════════════════════════════════════ -.PHONY: fmt lint vet test test-integration test-all tidy check - -fmt: - gofmt -w . - cd $(ENVD_DIR) && gofmt -w . - -lint: - golangci-lint run ./... - -vet: - go vet ./... - cd $(ENVD_DIR) && go vet ./... - -test: - go test -race -v ./internal/... - -test-integration: - go test -race -v -tags=integration ./tests/integration/... - -test-all: test test-integration - -tidy: - go mod tidy - cd $(ENVD_DIR) && go mod tidy - -## Run all quality checks in CI order -check: fmt vet lint test - -# ═══════════════════════════════════════════════════ -# Rootfs Images -# ═══════════════════════════════════════════════════ -.PHONY: images image-minimal image-python image-node - -images: build-envd image-minimal image-python image-node - -image-minimal: - sudo bash images/templates/minimal/build.sh - -image-python: - sudo bash images/templates/python311/build.sh - -image-node: - sudo bash images/templates/node20/build.sh - -# ═══════════════════════════════════════════════════ -# Deployment -# ═══════════════════════════════════════════════════ -.PHONY: setup-host install - -setup-host: - sudo bash scripts/setup-host.sh - -install: build - sudo cp $(GOBIN)/wrenn-cp /usr/local/bin/ - sudo cp $(GOBIN)/wrenn-agent /usr/local/bin/ - sudo cp deploy/systemd/*.service /etc/systemd/system/ - sudo systemctl daemon-reload - -# ═══════════════════════════════════════════════════ -# Clean -# ═══════════════════════════════════════════════════ -.PHONY: clean - -clean: - rm -rf bin/ - cd $(ENVD_DIR) && rm -f envd - -# ═══════════════════════════════════════════════════ -# Help -# ═══════════════════════════════════════════════════ -.DEFAULT_GOAL := help -.PHONY: help -help: - @echo "Wrenn Sandbox" - @echo "" - @echo " make dev Full local dev (infra + migrate + seed + control plane)" - @echo " make dev-infra Start PostgreSQL + Prometheus + Grafana" - @echo " make dev-down Stop dev infra" - @echo " make dev-cp Control plane (hot reload if air installed)" - @echo " make dev-agent Host agent (sudo required)" - @echo " make dev-envd envd in TCP debug mode" - @echo "" - @echo " make build Build all binaries → bin/" - @echo " make build-envd Build envd static binary" - @echo "" - @echo " make migrate-up Apply migrations" - @echo " make migrate-create name=xxx New migration" - @echo " make migrate-reset Drop + re-apply all" - @echo "" - @echo " make generate Proto + sqlc codegen" - @echo " make check fmt + vet + lint + test" - @echo " make test-all Unit + integration tests" - @echo "" - @echo " make images Build all rootfs images" - @echo " make setup-host One-time host setup" - @echo " make install Install binaries + systemd units" -``` - -### docker-compose.dev.yml - -```yaml -# deploy/docker-compose.dev.yml -services: - postgres: - image: postgres:16-alpine - environment: - POSTGRES_USER: wrenn - POSTGRES_PASSWORD: wrenn - POSTGRES_DB: wrenn - ports: - - "5432:5432" - volumes: - - pgdata:/var/lib/postgresql/data - - prometheus: - image: prom/prometheus:latest - ports: - - "9090:9090" - volumes: - - ./deploy/prometheus.yml:/etc/prometheus/prometheus.yml - - grafana: - image: grafana/grafana:latest - ports: - - "3001:3000" - environment: - GF_SECURITY_ADMIN_PASSWORD: admin - -volumes: - pgdata: -``` - -### .env.example - -```bash -# Database -DATABASE_URL=postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable - -# Control Plane -CP_LISTEN_ADDR=:8000 -CP_HOST_AGENT_ADDR=localhost:50051 - -# Host Agent -AGENT_LISTEN_ADDR=:50051 -AGENT_KERNEL_PATH=/var/lib/wrenn/kernels/vmlinux -AGENT_IMAGES_PATH=/var/lib/wrenn/images -AGENT_SANDBOXES_PATH=/var/lib/wrenn/sandboxes -AGENT_HOST_INTERFACE=eth0 - -# Lago (billing — external service) -LAGO_API_URL=http://localhost:3000 -LAGO_API_KEY= - -# Object Storage (hibernate snapshots — Hetzner Object Storage, S3-compatible) -# Hetzner Object Storage uses the S3-compatible API, so we use standard AWS SDK environment variables -S3_BUCKET=wrenn-snapshots -S3_REGION=fsn1 -S3_ENDPOINT=https://fsn1.your-objectstorage.com -AWS_ACCESS_KEY_ID= # Hetzner Object Storage access key (S3-compatible) -AWS_SECRET_ACCESS_KEY= # Hetzner Object Storage secret key (S3-compatible) -``` - -### Development Workflow - -```bash -# First time -git clone https://github.com/wrenn-dev/wrenn-sandbox && cd wrenn-sandbox -make tidy - -# Install tools -go install github.com/pressly/goose/v3/cmd/goose@latest -go install google.golang.org/protobuf/cmd/protoc-gen-go@latest -go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest -go install github.com/air-verse/air@latest -go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest - -# Start everything -make dev-infra # PostgreSQL + monitoring -make migrate-up # Create tables -make dev-seed # Test API key - -# Terminal 1 -make dev-cp # → http://localhost:8000 (API + admin UI) - -# Terminal 2 -make dev-agent # → gRPC on :50051 - -# Terminal 3 -curl http://localhost:8000/v1/sandboxes -open http://localhost:8000/admin/ -``` - -## Implementation Priority - -### Phase 1: Boot a VM -1. Build envd static binary -2. Create minimal rootfs with envd baked in -3. Write `internal/vm/` — boot Firecracker -4. Write `internal/envdclient/` — connect to envd over TAP network -5. Test: boot VM, run "echo hello", get output back - -### Phase 2: Host Agent -1. Write `internal/network/` — TAP + NAT per sandbox -2. Write `internal/filesystem/` — CoW rootfs clones -3. Define hostagent.proto, generate stubs -4. Write host agent rpc server -5. Test: curl to create/exec/destroy - -### Phase 3: Control Plane -1. Set up PostgreSQL, write goose migrations -2. Write `internal/api/` — REST handlers -3. Write `internal/auth/` — API key validation -4. Write `internal/scheduler/` — SingleHostScheduler -5. Test: curl to create/exec/destroy via REST - -### Phase 4: Admin UI -1. Write `internal/admin/` — htmx templates -2. Dashboard, sandbox list, sandbox detail -3. Host status, API key management -4. Test: browser, see sandboxes, perform actions - -### Phase 5: Persistence -1. Write `internal/snapshot/` — Firecracker snapshots -2. Add pause/hibernate/resume states -3. Write `internal/lifecycle/` — auto-pause idle sandboxes -4. Test: pause, resume, verify state intact - -### Phase 6: SDKs -1. Python SDK -2. TypeScript SDK -3. Go SDK -4. Test: end-to-end from SDK - -### Phase 7: Hardening -1. Jailer integration -2. cgroup resource limits -3. Egress filtering -4. Prometheus metrics -5. Stress testing - -## Dependencies - -### Go modules (main project) -``` -github.com/go-chi/chi/v5 -github.com/jackc/pgx/v5 -github.com/pressly/goose/v3 -github.com/firecracker-microvm/firecracker-go-sdk -github.com/vishvananda/netlink -google.golang.org/grpc -google.golang.org/protobuf -github.com/prometheus/client_golang -github.com/gorilla/websocket -github.com/rs/cors -golang.org/x/crypto -``` - -### envd Go modules (separate go.mod — minimal deps only) -``` -google.golang.org/grpc -google.golang.org/protobuf -github.com/vishvananda/netlink -``` - -### External services -- PostgreSQL (local Docker or managed) -- Lago (billing, HTTP API only) -- S3/GCS (hibernate snapshot storage) - -### Dev tools -``` -goose, protoc, protoc-gen-go, protoc-gen-go-grpc, air, golangci-lint, grpcurl, sqlc -``` - -## Important Notes - -- Host agent MUST run as root (NET_ADMIN + /dev/kvm). -- Control plane does NOT need root. -- envd is a **standalone Go module** (`envd/go.mod`). Never imported by other Go code. Static binary. Baked into rootfs images. -- `make dev` is the one command for local development. -- For dev without Firecracker, `make dev-envd` runs envd in TCP mode. \ No newline at end of file +**Overall feel:** Warm, earthy, semi-flat. Avoids cold grays entirely — palette leans slightly warm/brown-tinted throughout. The serif + mono + geometric sans type stack gives a designed but unfussy developer-tool character. Organic and considered, not sterile. diff --git a/README.md b/README.md index 28ae648..c5d2c4c 100644 --- a/README.md +++ b/README.md @@ -2,211 +2,92 @@ MicroVM-based code execution platform. Firecracker VMs, not containers. Pool-based pricing, persistent sandboxes, Python/TS/Go SDKs. -## Stack +## Deployment -| Component | Tech | -|---|---| -| Control plane | Go, chi, pgx, goose, htmx | -| Host agent | Go, Firecracker Go SDK, vsock | -| Guest agent (envd) | Go (extracted from E2B, standalone binary) | -| Database | PostgreSQL | -| Cache | Redis | -| Billing | Lago (external) | -| Snapshot storage | S3 (Seaweedfs for dev) | -| Monitoring | Prometheus + Grafana | -| Admin UI | htmx + Go html/template | +### Prerequisites -## Architecture +- Linux host with `/dev/kvm` access (bare metal or nested virt) +- Firecracker binary at `/usr/local/bin/firecracker` +- PostgreSQL +- Go 1.25+ -``` -SDK → HTTPS → Control Plane → gRPC → Host Agent → vsock → envd (inside VM) - │ │ - ├── PostgreSQL ├── Firecracker - ├── Redis ├── TAP/NAT networking - └── Lago (billing) ├── CoW rootfs clones - └── Prometheus /metrics -``` - -Control plane is stateless (state in Postgres + Redis). Host agent is stateful (manages VMs on the local machine). envd is a static binary baked into rootfs images — separate Go module, separate build, never imported by anything. - -## Prerequisites - -- Linux with `/dev/kvm` (bare metal or nested virt) -- Go 1.22+ -- Docker (for dev infra) -- Firecracker + jailer installed at `/usr/local/bin/` -- `protoc` + Go plugins for proto generation +### Build ```bash -# Firecracker -ARCH=$(uname -m) VERSION="v1.6.0" -curl -L "https://github.com/firecracker-microvm/firecracker/releases/download/${VERSION}/firecracker-${VERSION}-${ARCH}.tgz" | tar xz -sudo mv release-*/firecracker-* /usr/local/bin/firecracker -sudo mv release-*/jailer-* /usr/local/bin/jailer - -# Go tools -go install github.com/pressly/goose/v3/cmd/goose@latest -go install google.golang.org/protobuf/cmd/protoc-gen-go@latest -go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest -go install github.com/air-verse/air@latest -go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest - -# KVM -ls /dev/kvm && sudo setfacl -m u:${USER}:rw /dev/kvm +make build # outputs to builds/ ``` -## Quick Start +Produces three binaries: `wrenn-cp` (control plane), `wrenn-agent` (host agent), `envd` (guest agent). + +### Host setup + +The host agent machine needs: ```bash -cp .env.example .env -make tidy -make dev-infra # Postgres, Redis, Prometheus, Grafana +# Kernel for guest VMs +mkdir -p /var/lib/wrenn/kernels +# Place a vmlinux kernel at /var/lib/wrenn/kernels/vmlinux + +# Rootfs images +mkdir -p /var/lib/wrenn/images +# Build or place .ext4 rootfs images (e.g., minimal.ext4) + +# Sandbox working directory +mkdir -p /var/lib/wrenn/sandboxes + +# Enable IP forwarding +sysctl -w net.ipv4.ip_forward=1 +``` + +### Configure + +Copy `.env.example` to `.env` and edit: + +```bash +# Required +DATABASE_URL=postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable + +# Control plane +CP_LISTEN_ADDR=:8000 +CP_HOST_AGENT_ADDR=http://localhost:50051 + +# Host agent +AGENT_LISTEN_ADDR=:50051 +AGENT_KERNEL_PATH=/var/lib/wrenn/kernels/vmlinux +AGENT_IMAGES_PATH=/var/lib/wrenn/images +AGENT_SANDBOXES_PATH=/var/lib/wrenn/sandboxes +``` + +### Run + +```bash +# Apply database migrations make migrate-up -make dev-seed -# Terminal 1 -make dev-cp # :8000 +# Start host agent (requires root) +sudo ./builds/wrenn-agent -# Terminal 2 -make dev-agent # :50051 (sudo) +# Start control plane +./builds/wrenn-cp ``` -- API: `http://localhost:8000/v1/sandboxes` -- Admin: `http://localhost:8000/admin/` -- Grafana: `http://localhost:3001` (admin/admin) -- Prometheus: `http://localhost:9090` +Control plane listens on `CP_LISTEN_ADDR` (default `:8000`). Host agent listens on `AGENT_LISTEN_ADDR` (default `:50051`). -## Layout +### Rootfs images -``` -cmd/ - control-plane/ REST API + admin UI + gRPC client + lifecycle manager - host-agent/ gRPC server + Firecracker + networking + metrics - -envd/ standalone Go module — separate go.mod, static binary - extracted from e2b-dev/infra, talks gRPC over vsock - -proto/ - hostagent/ control plane ↔ host agent - envd/ host agent ↔ guest agent (from E2B spec/) - -internal/ - api/ chi handlers - admin/ htmx + Go templates - auth/ API key + rate limiting - scheduler/ SingleHost → LeastLoaded - lifecycle/ auto-pause, auto-hibernate, auto-destroy - vm/ Firecracker config, boot, stop, jailer - network/ TAP, NAT, IP allocator (/30 subnets) - filesystem/ base images, CoW clones (cp --reflink) - envdclient/ vsock dialer + gRPC client to envd - snapshot/ pause/resume + S3 offload - metrics/ cgroup stats + Prometheus exporter - models/ Sandbox, Host structs - config/ env + YAML loading - id/ sb-xxxxxxxx generation - -db/migrations/ goose SQL (00001_initial.sql, ...) -db/queries/ raw SQL or sqlc - -images/templates/ rootfs build scripts (minimal, python311, node20) -sdk/ Python, TypeScript, Go client SDKs -deploy/ systemd units, ansible, docker-compose.dev.yml -``` - -## Commands +envd must be baked into every rootfs image. After building: ```bash -# Dev -make dev # everything: infra + migrate + seed + control plane -make dev-infra # just Postgres/Redis/Prometheus/Grafana -make dev-down # tear down -make dev-cp # control plane (hot reload with air) -make dev-agent # host agent (sudo) -make dev-envd # envd in TCP debug mode (no Firecracker) -make dev-seed # test API key + data - -# Build -make build # all → bin/ -make build-envd # static binary, verified - -# DB -make migrate-up -make migrate-down -make migrate-create name=xxx -make migrate-reset # drop + re-apply - -# Codegen -make generate # proto + sqlc -make proto - -# Quality -make check # fmt + vet + lint + test -make test # unit -make test-all # unit + integration -make tidy # go mod tidy (both modules) - -# Images -make images # all rootfs (needs sudo + envd) - -# Deploy -make setup-host # one-time KVM/networking setup -make install # binaries + systemd +make build-envd +bash scripts/update-debug-rootfs.sh /var/lib/wrenn/images/minimal.ext4 ``` -## Database +## Development -Postgres via pgx. No ORM. Migrations via goose (plain SQL). - -Tables: `sandboxes`, `hosts`, `audit_events`, `api_keys`. - -States: `pending → starting → running → paused → hibernated → stopped`. Any → `error`. - -## envd - -From [e2b-dev/infra](https://github.com/e2b-dev/infra) (Apache 2.0). PID 1 inside every VM. Exposes ProcessService + FilesystemService over gRPC on vsock. - -Own `go.mod`. Must be `CGO_ENABLED=0`. Baked into rootfs at `/usr/local/bin/envd`. Kernel args: `init=/usr/local/bin/envd`. - -Host agent connects via Firecracker vsock UDS using `CONNECT \n` handshake. - -## Networking - -Each sandbox: `/30` from `10.0.0.0/16` (~16K per host). - -``` -Host: tap-sb-a1b2c3d4 (10.0.0.1/30) ↔ Guest eth0 (10.0.0.2/30) -NAT: iptables MASQUERADE via host internet interface +```bash +make dev # Start PostgreSQL (Docker), run migrations, start control plane +make dev-agent # Start host agent (separate terminal, sudo) +make check # fmt + vet + lint + test ``` -## Snapshots - -- **Warm pause**: Firecracker snapshot on local NVMe. Resume <1s. -- **Cold hibernate**: zstd compressed, uploaded to S3/MinIO. Resume 5-10s. - -## API - -``` -POST /v1/sandboxes create -GET /v1/sandboxes list -GET /v1/sandboxes/{id} status -POST /v1/sandboxes/{id}/exec exec -PUT /v1/sandboxes/{id}/files upload -GET /v1/sandboxes/{id}/files/* download -POST /v1/sandboxes/{id}/pause pause -POST /v1/sandboxes/{id}/resume resume -DELETE /v1/sandboxes/{id} destroy -WS /v1/sandboxes/{id}/terminal shell -``` - -Auth: `X-API-Key` header. Prefix: `wrn_`. - -## Phases - -1. Boot VM + exec via vsock (W1) -2. Host agent + networking (W2) -3. Control plane + DB + REST (W3) -4. Admin UI / htmx (W4) -5. Pause / hibernate / resume (W5) -6. SDKs (W6) -7. Jailer, cgroups, egress, metrics (W7-8) \ No newline at end of file +See `CLAUDE.md` for full architecture documentation.