Go to file

pptx704 0012088191 fix(envd): avoid lost prune on already-exited process

Check cached_end() after subscribing so the cleanup task does not block
on a broadcast receiver that already missed the end event from a
short-lived process.

2026-05-19 23:41:29 +06:00

cmd

feat(vm): CH live snapshot, pause/resume with UFFD memory restore

2026-05-18 14:05:27 +06:00

auth: replace user JWTs with cookie sessions

2026-05-19 04:01:24 +06:00

deploy

v0.1.0 (#17 )

2026-04-16 19:24:25 +00:00

envd-rs

fix(envd): avoid lost prune on already-exited process

2026-05-19 23:41:29 +06:00

frontend

events/capsules: dedupe channel notifications, harden create dialog

2026-05-19 04:27:11 +06:00

images

feat(vm): CH live snapshot, pause/resume with UFFD memory restore

2026-05-18 14:05:27 +06:00

internal

refactor(cpextension): remove LimitsProvider/UsageProvider hooks

2026-05-19 18:56:47 +06:00

pkg

refactor(cpextension): remove LimitsProvider/UsageProvider hooks

2026-05-19 18:56:47 +06:00

proto

fix: resolve bugs and DRY violations in sandbox manager and API handlers

2026-05-17 02:30:32 +06:00

recipes

v0.1.0 (#17 )

2026-04-16 19:24:25 +00:00

scripts

Updated cleanup script

2026-05-19 17:28:36 +06:00

tests

Initial project structure for Wrenn Sandbox

2026-03-09 17:22:47 +06:00

.env.example

chore: update proto, scripts, and docs for CH migration

2026-05-17 01:33:35 +06:00

.gitignore

fix: resolve bugs and DRY violations in sandbox manager and API handlers

2026-05-17 02:30:32 +06:00

CLAUDE.md

refactor(cpextension): remove LimitsProvider/UsageProvider hooks

2026-05-19 18:56:47 +06:00

go.mod

v0.1.0 (#17 )

2026-04-16 19:24:25 +00:00

go.sum

v0.1.0 (#17 )

2026-04-16 19:24:25 +00:00

LICENSE

v0.0.1 (#8 )

2026-04-09 19:24:49 +00:00

Makefile

chore: update proto, scripts, and docs for CH migration

2026-05-17 01:33:35 +06:00

README.md

extensions: auth/sandbox hooks, limits/usage providers, exported session middleware

2026-05-19 04:49:19 +06:00

sqlc.yaml

v0.1.0 (#17 )

2026-04-16 19:24:25 +00:00

VERSION_AGENT

Bump versions

2026-05-15 13:56:04 +06:00

VERSION_CP

Bump versions

2026-05-15 13:56:04 +06:00

README.md

Wrenn

Secure infrastructure for AI

Prerequisites

Linux host with /dev/kvm access (bare metal or nested virt)
Cloud Hypervisor binary at /usr/local/bin/cloud-hypervisor
PostgreSQL
Go 1.25+
Rust 1.88+ with x86_64-unknown-linux-musl target (rustup target add x86_64-unknown-linux-musl)
Bun (for frontend)
Docker (for dev infra and rootfs builds)

Build

make build    # outputs to builds/

Produces three binaries: wrenn-cp (control plane), wrenn-agent (host agent), envd (guest agent).

Host setup

The host agent needs a kernel, a minimal rootfs image, and working directories on the host machine.

Directory structure

/var/lib/wrenn/
├── kernels/
│   └── vmlinux              # uncompressed Linux kernel (not bzImage)
├── images/
│   └── minimal/
│       └── rootfs.ext4      # base rootfs (all other templates snapshot from this)
├── sandboxes/               # per-sandbox CoW files (created at runtime)
└── snapshots/               # pause/hibernate snapshot files (created at runtime)

Create the directories:

sudo mkdir -p /var/lib/wrenn/{kernels,images/minimal,sandboxes,snapshots}

Kernel

Place an uncompressed vmlinux kernel at /var/lib/wrenn/kernels/vmlinux. Versioned kernels (vmlinux-{semver}) are also supported — the agent picks the latest by semver.

Minimal rootfs

The minimal rootfs is the base image that all other templates (Python, Node, etc.) are built on top of via device-mapper snapshots. It must contain:

Package	Why
`socat`	Bidirectional relay for port forwarding
`chrony`	Time sync from KVM PTP clock (`/dev/ptp0`)
`tini`	PID 1 zombie reaper (injected by build script, not apt)
`sudo`	User privilege management inside the guest
`wget`	HTTP fetching
`curl`	HTTP client
`ca-certificates`	TLS certificate verification

To build a rootfs from a Docker container:

Create and configure a container with the required packages:

docker run -it --name wrenn-minimal debian:bookworm bash
# Inside the container:
apt update && apt install -y socat chrony sudo wget curl ca-certificates
exit

Export to a rootfs image (builds envd, injects wrenn-init + tini, shrinks to minimum size):
```
sudo bash scripts/rootfs-from-container.sh wrenn-minimal minimal
```

To update an existing rootfs after changing envd or wrenn-init.sh:

bash scripts/update-minimal-rootfs.sh

This rebuilds envd via make build-envd and copies the fresh binaries into the mounted rootfs image.

IP forwarding

sudo sysctl -w net.ipv4.ip_forward=1

Configure

Copy .env.example to .env and edit:

# Required
DATABASE_URL=postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable

# Control plane
WRENN_CP_LISTEN_ADDR=:8000
CP_HOST_AGENT_ADDR=http://localhost:50051

# Host agent
WRENN_HOST_LISTEN_ADDR=:50051
WRENN_DIR=/var/lib/wrenn

Development

make dev          # Start PostgreSQL (Docker), run migrations, start control plane
make dev-agent    # Start host agent (separate terminal, sudo)
make dev-frontend # Vite dev server with HMR (port 5173)
make check        # fmt + vet + lint + test

Host registration

Hosts must be registered with the control plane before they can serve sandboxes.

Create a host record in the dashboard (admin only — host management is not exposed over the SDK / API keys). Sign in at /login, open the admin hosts page, and click Add host. The dashboard returns a registration_token valid for 1 hour.
Start the host agent with the registration token and its externally-reachable address:
```
sudo WRENN_CP_URL=http://localhost:8000 \
     ./builds/wrenn-agent \
     --register <token-from-step-1> \
     --address <host-ip>:50051
```
On first startup the agent sends its specs (arch, CPU, memory, disk) to the control plane, receives a long-lived host JWT, and saves it to $WRENN_DIR/host-token.
Subsequent startups don't need --register — the agent loads the saved JWT automatically:
```
sudo ./builds/wrenn-agent --address <host-ip>:50051
```
If registration fails (e.g., network error after token was consumed), regenerate a token from the dashboard host detail page, then restart the agent with the new token.

The agent sends heartbeats to the control plane every 30 seconds.

Notification channels

Teams can subscribe to lifecycle events via webhook, Discord, Slack, Teams, Google Chat, Telegram, or Matrix. All providers consume the same event stream (durable Redis stream wrenn:events, consumer group wrenn-channels-v1, at-least-once delivery with two retries at 10s / 30s).

Subscribable event types

Event	Emitted on	Has outcome
`capsule.create`	First boot of a sandbox	yes
`capsule.pause`	Manual pause, TTL auto-pause, or reconciler-detected pause	yes
`capsule.resume`	Unpause (any subsequent boot after `capsule.create`)	yes
`capsule.destroy`	Stop / destroy, including system cleanup-on-error	yes
`template.snapshot.create`	Snapshot taken from a running sandbox	yes
`template.snapshot.delete`	Snapshot deletion (including cleanup-on-error)	yes
`host.up`	Host agent comes online	no
`host.down`	Host agent crashes or misses heartbeats	no

Subscribing to an event type delivers both success and failure. The outcome field on the payload (success or error) distinguishes them. error events carry an error string with the failure reason.

The transient capsule.state.changed event (intermediate transitions like starting, pausing, resuming) is not subscribable — it is delivered to the dashboard via SSE only and never written to the durable stream.

Event payload

All channels receive the same canonical JSON shape:

{
  "event": "capsule.pause",
  "outcome": "success",
  "timestamp": "2026-05-19T14:23:01Z",
  "team_id": "tm_...",
  "actor": {
    "type": "user",
    "id": "usr_...",
    "name": "alice@example.com"
  },
  "resource": {
    "id": "sb_a1b2c3d4",
    "type": "sandbox"
  },
  "metadata": {
    "reason": "ttl_expired"
  },
  "error": ""
}

Field	Type	Notes
`event`	string	Event type (see table above)
`outcome`	`"success"` \| `"error"` \| `""`	Omitted for host.up/host.down
`timestamp`	RFC3339 UTC	When the event was published
`team_id`	string	Owning team
`actor.type`	`"user"` \| `"api_key"` \| `"system"`	System = TTL reaper, reconciler, cleanup-on-error
`actor.id`	string	User ID, API key ID, or empty for system
`actor.name`	string	Display name (email for user, label for api_key)
`resource.id`	string	Sandbox ID, snapshot ID, or host ID
`resource.type`	`"sandbox"` \| `"snapshot"` \| `"host"`
`metadata`	object<string,string>	Event-specific context (e.g., `reason`, `from`/`to`, `inferred`)
`error`	string	Failure reason when `outcome == "error"`

metadata keys you may observe:

reason — ttl_expired (auto-pause), orphaned (reconciler cleanup), cleanup_after_create_error, restored_after_host_recovery, host_state_sync, transient_timeout, transient_timeout_inferred
inferred — "true" when the reconciler derived the event from host state, not a direct host callback

Webhook delivery

Webhook channels receive a raw POST with the JSON payload as the body.

Headers:

Header	Value
`Content-Type`	`application/json`
`X-Wrenn-Delivery`	UUID, unique per delivery attempt
`X-Wrenn-Timestamp`	RFC3339 UTC, used for signature verification
`X-WRENN-SIGNATURE`	`sha256=<hex>` HMAC over `<timestamp>.<body>` using the channel's signing secret

The signing secret is shown once at channel creation. Verify signatures by computing HMAC-SHA256(secret, timestamp + "." + body) and comparing to the header (constant-time compare). Reject deliveries where X-Wrenn-Timestamp is outside your acceptable clock skew window. Redirects are not followed.

Any non-2xx response triggers retry (10s, then 30s). After three total failures the event is dropped (logged on the control plane).

Other providers

Discord, Slack, Teams, Google Chat, Telegram, and Matrix receive a formatted text message — the same fields, rendered as human-readable text — not the JSON payload. Use webhook if you need the structured event.

Extending the control plane

The OSS control plane is designed to be embedded by a private cloud distribution without forking. Import this module, implement the Extension interface from pkg/cpextension, and pass it to cpserver.Run:

import (
    "git.omukk.dev/wrenn/wrenn/pkg/cpextension"
    "git.omukk.dev/wrenn/wrenn/pkg/cpserver"
)

func main() {
    cpserver.Run(
        cpserver.WithVersion("cloud-1.0.0"),
        cpserver.WithExtensions(&myExtension{}),
    )
}

Every extension implements two methods:

RegisterRoutes(r chi.Router, sctx cpextension.ServerContext)
BackgroundWorkers(sctx cpextension.ServerContext) []func(context.Context)

ServerContext exposes the initialized OSS services so extensions never re-implement them: Queries, PgPool, Redis, HostPool, Scheduler, CA, Audit, Mailer, OAuthRegistry, Channels, ChannelPub, JWTSecret, Sessions, Config.

Optional hook interfaces

An extension can also implement any subset of these — the OSS server type-asserts at startup:

Interface	When it fires	Failure semantics
`MiddlewareProvider`	Wraps every OSS route before registration	n/a
`AuthHook.OnSignup(ctx, userID, teamID, email)`	After team provisioning on email-activate or OAuth-new-signup	Error aborts signup with 500 `signup_hook_failed` (billing customer creation must succeed)
`AuthHook.OnLogin(ctx, userID)`	After a successful login or OAuth callback	Error logged, login still succeeds
`AuthHook.OnAccountSoftDelete(ctx, userID)`	After `DELETE /v1/me` commits	Error logged, request still succeeds
`AuthHook.OnAccountHardDelete(ctx, userID)`	After the 15-day cleanup goroutine purges a soft-deleted account	Error logged, cleanup continues
`SandboxEventHook.OnSandboxEvent(ctx, ev)`	Capsule create/pause/resume/destroy success, from the Redis stream consumer	Error leaves the message un-acked — hooks must be idempotent
`LimitsProvider.EffectiveLimits(ctx, teamID)`	`POST /v1/capsules` consults before scheduling	Returns 402 (`concurrent_sandbox_limit` / `vcpu_limit` / `memory_limit`) when over
`UsageProvider.CurrentUsage(ctx, teamID)`	Feeds `LimitsProvider` checks; falls back to OSS DB-backed default	Error → 402 `usage_unavailable`

Auth middleware helpers

For extensions that gate their own routes:

r.With(cpextension.RequireSession(sctx)).Get("/billing", handler)
r.With(cpextension.RequireSessionOrAPIKey(sctx)).Get("/usage", handler)
r.With(cpextension.RequireSession(sctx), cpextension.RequireAdmin(sctx)).Get("/admin/exports", handler)

// Issue a session from a custom flow (e.g. invite-accept):
sess, err := cpextension.IssueSession(w, r, sctx, userID, teamID)

Cookie/header names are exported as cpextension.SessionCookieName, CSRFCookieName, CSRFHeaderName.

See CLAUDE.md for full architecture documentation.

Languages

Go 50.6%

Svelte 33.7%

Rust 10.1%

TypeScript 2.7%

Shell 1.1%

Other 1.8%