1
0
forked from wrenn/wrenn
Files
wrenn-releases/README.md
Rafeed M. Bhuiyan 05ddf62399 v0.2.0 (#50)
Co-authored-by: Tasnim Kabir Sadik <tksadik@omukk.dev>

Reviewed-on: wrenn/wrenn#50
2026-05-24 21:10:37 +00:00

303 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Wrenn
Secure infrastructure for AI
## Prerequisites
- Linux host with `/dev/kvm` access (bare metal or nested virt)
- Cloud Hypervisor binary at `/usr/local/bin/cloud-hypervisor`
- PostgreSQL
- Go 1.25+
- Rust 1.88+ with `x86_64-unknown-linux-musl` target (`rustup target add x86_64-unknown-linux-musl`)
- Bun (for frontend)
- Docker (for dev infra and rootfs builds)
## Build
```bash
make build # outputs to builds/
```
Produces three binaries: `wrenn-cp` (control plane), `wrenn-agent` (host agent), `envd` (guest agent).
## Host setup
The host agent needs a kernel, the system base rootfs images, and working directories on the host machine.
### Directory structure
```
/var/lib/wrenn/
├── kernels/
│ └── vmlinux # uncompressed Linux kernel (not bzImage)
├── images/
│ └── teams/
│ └── 0000000000000000000000000/ # platform team (base36 all-zeros)
│ ├── 0000000000000000000000000/rootfs.ext4 # minimal-ubuntu (id 0)
│ ├── 0000000000000000000000001/rootfs.ext4 # minimal-alpine (id 1)
│ ├── 0000000000000000000000002/rootfs.ext4 # minimal-arch (id 2)
│ └── 0000000000000000000000003/rootfs.ext4 # minimal-fedora (id 3)
├── sandboxes/ # per-sandbox CoW files (created at runtime)
└── snapshots/ # pause/hibernate snapshot files (created at runtime)
```
Create the base directories (the per-template image dirs are created by the build scripts):
```bash
sudo mkdir -p /var/lib/wrenn/{kernels,images,sandboxes,snapshots}
```
### Kernel
Place an uncompressed `vmlinux` kernel at `/var/lib/wrenn/kernels/vmlinux`. Versioned kernels (`vmlinux-{semver}`) are also supported — the agent picks the latest by semver.
### System base rootfs images
There are four built-in **system base templates** — one per distro — that all other
templates snapshot from via device-mapper. They are platform-owned (visible to every
team) and protected from deletion (reserved template IDs 01024):
| Template | Distro | ID |
|----------|--------|----|
| `minimal-ubuntu` | `ubuntu:26.04` | 0 |
| `minimal-alpine` | `alpine:3.22` | 1 |
| `minimal-arch` | `archlinux:base` | 2 |
| `minimal-fedora` | `fedora:45` | 3 |
`minimal-ubuntu` is the default template for new sandboxes and builds. The same
statically-linked `envd` + `tini` run on all four regardless of the distro's libc
(glibc on Ubuntu/Arch/Fedora, musl on Alpine).
Each image contains these packages plus a `wrenn-user` account with passwordless `sudo`:
| Package | Why |
|---------|-----|
| `socat` | Bidirectional relay for port forwarding |
| `chrony` | Time sync from KVM PTP clock (`/dev/ptp0`) |
| `iproute2` (`iproute` on Fedora) | `ip` for guest network setup in `wrenn-init` |
| `tini` | PID 1 zombie reaper |
| `sudo` | User privilege management inside the guest |
| `wget` | HTTP fetching |
| `curl` | HTTP client |
| `ca-certificates` | TLS certificate verification |
| `git` | Version control |
**To build all four images** (each spawns a distro container, installs the packages +
`wrenn-user`, builds `envd`, injects `wrenn-init` + `tini`, and exports to the
team-scoped path). Requires Docker + sudo:
```bash
make images
```
Or build a single distro: `make rootfs-ubuntu` / `rootfs-alpine` / `rootfs-arch` / `rootfs-fedora`.
**To update the images** after changing `envd` or `wrenn-init.sh` (rebuilds `envd` once,
then re-injects `envd` + `wrenn-init` + `tini` into every system base image):
```bash
bash scripts/update-minimal-rootfs.sh
```
### IP forwarding
```bash
sudo sysctl -w net.ipv4.ip_forward=1
```
## Configure
Copy `.env.example` to `.env` and edit:
```bash
# Required
DATABASE_URL=postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable
# Control plane
WRENN_CP_LISTEN_ADDR=:8000
CP_HOST_AGENT_ADDR=http://localhost:50051
# Host agent
WRENN_HOST_LISTEN_ADDR=:50051
WRENN_DIR=/var/lib/wrenn
```
## Development
```bash
make dev # Start PostgreSQL (Docker), run migrations, start control plane
make dev-agent # Start host agent (separate terminal, sudo)
make dev-frontend # Vite dev server with HMR (port 5173)
make check # fmt + vet + lint + test
```
### Host registration
Hosts must be registered with the control plane before they can serve sandboxes.
1. **Create a host record** in the dashboard (admin only — host management is not exposed over the SDK / API keys). Sign in at `/login`, open the admin hosts page, and click **Add host**. The dashboard returns a `registration_token` valid for 1 hour.
2. **Start the host agent** with the registration token and its externally-reachable address:
```bash
sudo WRENN_CP_URL=http://localhost:8000 \
./builds/wrenn-agent \
--register <token-from-step-1> \
--address <host-ip>:50051
```
On first startup the agent sends its specs (arch, CPU, memory, disk) to the control plane, receives a long-lived host JWT, and saves it to `$WRENN_DIR/host-token`.
3. **Subsequent startups** don't need `--register` — the agent loads the saved JWT automatically:
```bash
sudo ./builds/wrenn-agent --address <host-ip>:50051
```
4. **If registration fails** (e.g., network error after token was consumed), regenerate a token from the dashboard host detail page, then restart the agent with the new token.
The agent sends heartbeats to the control plane every 30 seconds.
## Notification channels
Teams can subscribe to lifecycle events via webhook, Discord, Slack, Teams, Google Chat, Telegram, or Matrix. All providers consume the same event stream (durable Redis stream `wrenn:events`, consumer group `wrenn-channels-v1`, at-least-once delivery with two retries at 10s / 30s).
### Subscribable event types
| Event | Emitted on | Has outcome |
|-------|-----------|-------------|
| `capsule.create` | First boot of a sandbox | yes |
| `capsule.pause` | Manual pause, TTL auto-pause, or reconciler-detected pause | yes |
| `capsule.resume` | Unpause (any subsequent boot after `capsule.create`) | yes |
| `capsule.destroy` | Stop / destroy, including system cleanup-on-error | yes |
| `template.snapshot.create` | Snapshot taken from a running sandbox | yes |
| `template.snapshot.delete` | Snapshot deletion (including cleanup-on-error) | yes |
| `host.up` | Host agent comes online | no |
| `host.down` | Host agent crashes or misses heartbeats | no |
Subscribing to an event type delivers **both success and failure**. The `outcome` field on the payload (`success` or `error`) distinguishes them. `error` events carry an `error` string with the failure reason.
The transient `capsule.state.changed` event (intermediate transitions like `starting`, `pausing`, `resuming`) is **not** subscribable — it is delivered to the dashboard via SSE only and never written to the durable stream.
### Event payload
All channels receive the same canonical JSON shape:
```json
{
"event": "capsule.pause",
"outcome": "success",
"timestamp": "2026-05-19T14:23:01Z",
"team_id": "tm_...",
"actor": {
"type": "user",
"id": "usr_...",
"name": "alice@example.com"
},
"resource": {
"id": "sb_a1b2c3d4",
"type": "sandbox"
},
"metadata": {
"reason": "ttl_expired"
},
"error": ""
}
```
| Field | Type | Notes |
|-------|------|-------|
| `event` | string | Event type (see table above) |
| `outcome` | `"success"` \| `"error"` \| `""` | Omitted for host.up/host.down |
| `timestamp` | RFC3339 UTC | When the event was published |
| `team_id` | string | Owning team |
| `actor.type` | `"user"` \| `"api_key"` \| `"system"` | System = TTL reaper, reconciler, cleanup-on-error |
| `actor.id` | string | User ID, API key ID, or empty for system |
| `actor.name` | string | Display name (email for user, label for api_key) |
| `resource.id` | string | Sandbox ID, snapshot ID, or host ID |
| `resource.type` | `"sandbox"` \| `"snapshot"` \| `"host"` | |
| `metadata` | object\<string,string\> | Event-specific context (e.g., `reason`, `from`/`to`, `inferred`) |
| `error` | string | Failure reason when `outcome == "error"` |
`metadata` keys you may observe:
- `reason` — `ttl_expired` (auto-pause), `orphaned` (reconciler cleanup), `cleanup_after_create_error`, `restored_after_host_recovery`, `host_state_sync`, `transient_timeout`, `transient_timeout_inferred`
- `inferred` — `"true"` when the reconciler derived the event from host state, not a direct host callback
### Webhook delivery
Webhook channels receive a raw `POST` with the JSON payload as the body.
Headers:
| Header | Value |
|--------|-------|
| `Content-Type` | `application/json` |
| `X-Wrenn-Delivery` | UUID, unique per delivery attempt |
| `X-Wrenn-Timestamp` | RFC3339 UTC, used for signature verification |
| `X-WRENN-SIGNATURE` | `sha256=<hex>` HMAC over `<timestamp>.<body>` using the channel's signing secret |
The signing secret is shown **once** at channel creation. Verify signatures by computing `HMAC-SHA256(secret, timestamp + "." + body)` and comparing to the header (constant-time compare). Reject deliveries where `X-Wrenn-Timestamp` is outside your acceptable clock skew window. Redirects are not followed.
Any non-2xx response triggers retry (10s, then 30s). After three total failures the event is dropped (logged on the control plane).
### Other providers
Discord, Slack, Teams, Google Chat, Telegram, and Matrix receive a formatted text message — the same fields, rendered as human-readable text — not the JSON payload. Use webhook if you need the structured event.
## Extending the control plane
The OSS control plane is designed to be embedded by a private cloud distribution without forking. Import this module, implement the `Extension` interface from `pkg/cpextension`, and pass it to `cpserver.Run`:
```go
import (
"git.omukk.dev/wrenn/wrenn/pkg/cpextension"
"git.omukk.dev/wrenn/wrenn/pkg/cpserver"
)
func main() {
cpserver.Run(
cpserver.WithVersion("cloud-1.0.0"),
cpserver.WithExtensions(&myExtension{}),
)
}
```
Every extension implements two methods:
```go
RegisterRoutes(r chi.Router, sctx cpextension.ServerContext)
BackgroundWorkers(sctx cpextension.ServerContext) []func(context.Context)
```
`ServerContext` exposes the initialized OSS services so extensions never re-implement them: `Queries`, `PgPool`, `Redis`, `HostPool`, `Scheduler`, `CA`, `Audit`, `Mailer`, `OAuthRegistry`, `Channels`, `ChannelPub`, `JWTSecret`, `Sessions`, `Config`.
### Optional hook interfaces
An extension can also implement any subset of these — the OSS server type-asserts at startup:
| Interface | When it fires | Failure semantics |
|---|---|---|
| `MiddlewareProvider` | Wraps every OSS route before registration | n/a |
| `AuthHook.OnSignup(ctx, userID, teamID, email)` | After team provisioning on email-activate or OAuth-new-signup | Error aborts signup with 500 `signup_hook_failed` (billing customer creation must succeed) |
| `AuthHook.OnLogin(ctx, userID)` | After a successful login or OAuth callback | Error logged, login still succeeds |
| `AuthHook.OnAccountSoftDelete(ctx, userID)` | After `DELETE /v1/me` commits | Error logged, request still succeeds |
| `AuthHook.OnAccountHardDelete(ctx, userID)` | After the 15-day cleanup goroutine purges a soft-deleted account | Error logged, cleanup continues |
| `SandboxEventHook.OnSandboxEvent(ctx, ev)` | Capsule create/pause/resume/destroy success, from the Redis stream consumer | Error leaves the message un-acked — hooks **must** be idempotent |
| `LimitsProvider.EffectiveLimits(ctx, teamID)` | `POST /v1/capsules` consults before scheduling | Returns 402 (`concurrent_sandbox_limit` / `vcpu_limit` / `memory_limit`) when over |
| `UsageProvider.CurrentUsage(ctx, teamID)` | Feeds `LimitsProvider` checks; falls back to OSS DB-backed default | Error → 402 `usage_unavailable` |
### Auth middleware helpers
For extensions that gate their own routes:
```go
r.With(cpextension.RequireSession(sctx)).Get("/billing", handler)
r.With(cpextension.RequireSessionOrAPIKey(sctx)).Get("/usage", handler)
r.With(cpextension.RequireSession(sctx), cpextension.RequireAdmin(sctx)).Get("/admin/exports", handler)
// Issue a session from a custom flow (e.g. invite-accept):
sess, err := cpextension.IssueSession(w, r, sctx, userID, teamID)
```
Cookie/header names are exported as `cpextension.SessionCookieName`, `CSRFCookieName`, `CSRFHeaderName`.
See `CLAUDE.md` for full architecture documentation.