1
0
forked from wrenn/wrenn

Merge pull request 'Improved codebase to prepare for production' (#32) from chore/hardening into dev

Reviewed-on: wrenn/wrenn#32
This commit is contained in:
2026-04-16 13:00:06 +00:00
13 changed files with 761 additions and 79 deletions

View File

@ -1,3 +1,7 @@
# Shared (applies to both control plane and host agent)
WRENN_DIR=/var/lib/wrenn
LOG_LEVEL=info
# Database
DATABASE_URL=postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable
@ -9,7 +13,6 @@ WRENN_CP_LISTEN_ADDR=:9725
# Host Agent
WRENN_HOST_LISTEN_ADDR=:50051
WRENN_DIR=/var/lib/wrenn
WRENN_HOST_INTERFACE=eth0
WRENN_CP_URL=http://localhost:9725
WRENN_DEFAULT_ROOTFS_SIZE=5Gi

View File

@ -12,10 +12,10 @@ All commands go through the Makefile. Never use raw `go build` or `go run`.
```bash
make build # Build all binaries → builds/
make build-cp # Control plane only (builds frontend first)
make build-cp # Control plane only
make build-agent # Host agent only
make build-envd # envd static binary (verified statically linked)
make build-frontend # SvelteKit dashboard → internal/dashboard/static/
make build-frontend # SvelteKit dashboard → frontend/build/ (served by Caddy)
make dev # Full local dev: infra + migrate + control plane
make dev-infra # Start PostgreSQL + Prometheus + Grafana (Docker)
@ -55,7 +55,7 @@ User SDK → HTTPS/WS → Control Plane → Connect RPC → Host Agent → HTTP/
| Binary | Module | Entry point | Runs as |
|--------|--------|-------------|---------|
| wrenn-cp | `git.omukk.dev/wrenn/wrenn` | `cmd/control-plane/main.go` | Unprivileged |
| wrenn-agent | `git.omukk.dev/wrenn/wrenn` | `cmd/host-agent/main.go` | Root (NET_ADMIN + /dev/kvm) |
| wrenn-agent | `git.omukk.dev/wrenn/wrenn` | `cmd/host-agent/main.go` | `wrenn` user with capabilities (SYS_ADMIN, NET_ADMIN, NET_RAW, SYS_PTRACE, KILL, DAC_OVERRIDE, MKNOD) via setcap; also accepts root |
| envd | `git.omukk.dev/wrenn/wrenn/envd` (standalone `envd/go.mod`) | `envd/main.go` | PID 1 inside guest VM |
envd is a **completely independent Go module**. It is never imported by the main module. The only connection is the protobuf contract. It compiles to a static binary baked into rootfs images.
@ -64,7 +64,7 @@ envd is a **completely independent Go module**. It is never imported by the main
### Control Plane
**Internal packages:** `internal/api/`, `internal/dashboard/`, `internal/email/`
**Internal packages:** `internal/api/`, `internal/email/`
**Public packages (importable by cloud repo):** `pkg/config/`, `pkg/db/`, `pkg/auth/`, `pkg/auth/oauth/`, `pkg/scheduler/`, `pkg/lifecycle/`, `pkg/channels/`, `pkg/audit/`, `pkg/service/`, `pkg/events/`, `pkg/id/`, `pkg/validate/`
@ -78,7 +78,7 @@ Startup (`cmd/control-plane/main.go`) is a thin wrapper: `cpserver.Run(cpserver.
- **API Server** (`internal/api/server.go`): chi router with middleware. Creates handler structs (`sandboxHandler`, `execHandler`, `filesHandler`, etc.) injected with `db.Queries` and the host agent Connect RPC client. Routes under `/v1/capsules/*`. Accepts `[]cpextension.Extension` — each extension's `RegisterRoutes()` is called after all core routes are registered.
- **Reconciler** (`internal/api/reconciler.go`): background goroutine (every 30s) that compares DB records against `agent.ListSandboxes()` RPC. Marks orphaned DB entries as "stopped".
- **Dashboard** (SvelteKit + Tailwind + Bits UI, statically built and embedded via `go:embed`, served as catch-all at root)
- **Dashboard** (SvelteKit + Tailwind + Bits UI, built to static files in `frontend/build/`, served by Caddy as a reverse proxy)
- **Database**: PostgreSQL via pgx/v5. Queries generated by sqlc from `db/queries/*.sql``pkg/db/`. Migrations in `db/migrations/` (goose, plain SQL). `db/migrations/embed.go` exposes `migrations.FS` so the cloud repo can run OSS migrations via `go:embed`.
- **Config** (`pkg/config/config.go`): purely environment variables (`DATABASE_URL`, `CP_LISTEN_ADDR`, `CP_HOST_AGENT_ADDR`), no YAML/file config.
@ -86,7 +86,9 @@ Startup (`cmd/control-plane/main.go`) is a thin wrapper: `cpserver.Run(cpserver.
**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/devicemapper/`, `internal/envdclient/`, `internal/snapshot/`
Startup (`cmd/host-agent/main.go`) wires: root check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
**Production deployment:** `scripts/prepare-wrenn-user.sh` creates the `wrenn` system user, sets Linux capabilities (setcap) on wrenn-agent and all child binaries (iptables, losetup, dmsetup, etc.), installs an apt hook to restore capabilities after package updates, configures udev rules for `/dev/net/tun`, loads required kernel modules, and writes systemd unit files for both services. No sudo grants — all privilege is via capabilities.
Startup (`cmd/host-agent/main.go`) wires: root/capabilities check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
- **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return.
- **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes.
@ -113,8 +115,8 @@ Runs as PID 1 inside the microVM via `wrenn-init.sh` (mounts procfs/sysfs/dev, s
- **Package manager**: pnpm
- **Routing**: SvelteKit file-based routing under `frontend/src/routes/`
- **Routing layout**: `/login` and `/signup` at root, authenticated pages under `/dashboard/*` (e.g. `/dashboard/capsules`, `/dashboard/keys`)
- **Build output**: `frontend/build/` → copied to `internal/dashboard/static/` → embedded via `go:embed` into the control plane binary
- **Serving**: `internal/dashboard/dashboard.go` registers a `NotFound` catch-all SPA handler with fallback to `index.html`. API routes (`/v1/*`, `/openapi.yaml`, `/docs`) are registered first and take priority
- **Build output**: `frontend/build/` — static files served by Caddy
- **Serving**: Caddy reverse-proxies API requests to the control plane and serves the SvelteKit SPA directly. The control plane does not serve frontend assets.
- **Dev workflow**: `make dev-frontend` runs Vite dev server on port 5173 with HMR. API calls proxy to `http://localhost:8000`
- **Fonts**: Manrope (UI), Instrument Serif (headings), JetBrains Mono (code), Alice (brand wordmark) — all self-hosted via `@fontsource`
- **Dark mode**: class-based (`.dark` on `<html>`) with system preference detection + localStorage persistence
@ -209,7 +211,7 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
- **TAP networking** (not vsock) for host-to-envd communication
- **Device-mapper snapshots** for rootfs CoW — shared read-only loop device per base template, per-sandbox sparse CoW file, Firecracker gets `/dev/mapper/wrenn-{id}`
- **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down)
- **Dashboard**: SvelteKit (Svelte 5, adapter-static) + Tailwind CSS v4 + Bits UI. Built to static files, embedded into the Go binary via `go:embed`, served as catch-all at root
- **Dashboard**: SvelteKit (Svelte 5, adapter-static) + Tailwind CSS v4 + Bits UI. Built to static files in `frontend/build/`, served by Caddy (not embedded in the Go binary)
- **Lago** for billing (external service, not in this codebase)
## Coding Conventions

133
README.md
View File

@ -2,16 +2,16 @@
Secure infrastructure for AI
## Deployment
### Prerequisites
## Prerequisites
- Linux host with `/dev/kvm` access (bare metal or nested virt)
- Firecracker binary at `/usr/local/bin/firecracker`
- PostgreSQL
- Go 1.25+
- pnpm (for frontend)
- Docker (for dev infra and rootfs builds)
### Build
## Build
```bash
make build # outputs to builds/
@ -19,30 +19,77 @@ make build # outputs to builds/
Produces three binaries: `wrenn-cp` (control plane), `wrenn-agent` (host agent), `envd` (guest agent).
### Host setup
## Host setup
The host agent machine needs:
The host agent needs a kernel, a minimal rootfs image, and working directories on the host machine.
```bash
# Kernel for guest VMs
mkdir -p /var/lib/wrenn/kernels
# Place a vmlinux kernel at /var/lib/wrenn/kernels/vmlinux
### Directory structure
# Rootfs images
mkdir -p /var/lib/wrenn/images
# Build or place .ext4 rootfs images (e.g., minimal.ext4)
# Sandbox working directory
mkdir -p /var/lib/wrenn/sandboxes
# Snapshots directory
mkdir -p /var/lib/wrenn/snapshots
# Enable IP forwarding
sysctl -w net.ipv4.ip_forward=1
```
/var/lib/wrenn/
├── kernels/
│ └── vmlinux # uncompressed Linux kernel (not bzImage)
├── images/
│ └── minimal/
│ └── rootfs.ext4 # base rootfs (all other templates snapshot from this)
├── sandboxes/ # per-sandbox CoW files (created at runtime)
└── snapshots/ # pause/hibernate snapshot files (created at runtime)
```
### Configure
Create the directories:
```bash
sudo mkdir -p /var/lib/wrenn/{kernels,images/minimal,sandboxes,snapshots}
```
### Kernel
Place an uncompressed `vmlinux` kernel at `/var/lib/wrenn/kernels/vmlinux`. Versioned kernels (`vmlinux-{semver}`) are also supported — the agent picks the latest by semver.
### Minimal rootfs
The minimal rootfs is the base image that all other templates (Python, Node, etc.) are built on top of via device-mapper snapshots. It must contain:
| Package | Why |
|---------|-----|
| `socat` | Bidirectional relay for port forwarding |
| `chrony` | Time sync from KVM PTP clock (`/dev/ptp0`) |
| `tini` | PID 1 zombie reaper (injected by build script, not apt) |
| `sudo` | User privilege management inside the guest |
| `wget` | HTTP fetching |
| `curl` | HTTP client |
| `ca-certificates` | TLS certificate verification |
**To build a rootfs from a Docker container:**
1. Create and configure a container with the required packages:
```bash
docker run -it --name wrenn-minimal debian:bookworm bash
# Inside the container:
apt update && apt install -y socat chrony sudo wget curl ca-certificates
exit
```
2. Export to a rootfs image (builds envd, injects wrenn-init + tini, shrinks to minimum size):
```bash
sudo bash scripts/rootfs-from-container.sh wrenn-minimal minimal
```
**To update an existing rootfs** after changing envd or `wrenn-init.sh`:
```bash
bash scripts/update-minimal-rootfs.sh
```
This rebuilds envd via `make build-envd` and copies the fresh binaries into the mounted rootfs image.
### IP forwarding
```bash
sudo sysctl -w net.ipv4.ip_forward=1
```
## Configure
Copy `.env.example` to `.env` and edit:
@ -59,25 +106,21 @@ WRENN_HOST_LISTEN_ADDR=:50051
WRENN_DIR=/var/lib/wrenn
```
### Run
## Development
```bash
# Apply database migrations
make migrate-up
# Start control plane
./builds/wrenn-cp
make dev # Start PostgreSQL (Docker), run migrations, start control plane
make dev-agent # Start host agent (separate terminal, sudo)
make dev-frontend # Vite dev server with HMR (port 5173)
make check # fmt + vet + lint + test
```
Control plane listens on `WRENN_CP_LISTEN_ADDR` (default `:8000`).
### Host registration
Hosts must be registered with the control plane before they can serve sandboxes.
1. **Create a host record** (via API or dashboard):
```bash
# As an admin (JWT auth)
curl -X POST http://localhost:8000/v1/hosts \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
@ -87,17 +130,16 @@ Hosts must be registered with the control plane before they can serve sandboxes.
2. **Start the host agent** with the registration token and its externally-reachable address:
```bash
sudo WRENN_CP_URL=http://cp-host:8000 \
sudo WRENN_CP_URL=http://localhost:8000 \
./builds/wrenn-agent \
--register <token-from-step-1> \
--address 10.0.1.5:50051
--address <host-ip>:50051
```
On first startup the agent sends its specs (arch, CPU, memory, disk) to the control plane, receives a long-lived host JWT, and saves it to `$WRENN_DIR/host-token`.
3. **Subsequent startups** don't need `--register` — the agent loads the saved JWT automatically:
```bash
sudo WRENN_CP_URL=http://cp-host:8000 \
./builds/wrenn-agent --address 10.0.1.5:50051
sudo ./builds/wrenn-agent --address <host-ip>:50051
```
4. **If registration fails** (e.g., network error after token was consumed), regenerate a token:
@ -107,23 +149,6 @@ Hosts must be registered with the control plane before they can serve sandboxes.
```
Then restart the agent with the new token.
The agent sends heartbeats to the control plane every 30 seconds. Host agent listens on `WRENN_HOST_LISTEN_ADDR` (default `:50051`).
### Rootfs images
envd must be baked into every rootfs image. After building:
```bash
make build-envd
bash scripts/update-debug-rootfs.sh /var/lib/wrenn/images/minimal.ext4
```
## Development
```bash
make dev # Start PostgreSQL (Docker), run migrations, start control plane
make dev-agent # Start host agent (separate terminal, sudo)
make check # fmt + vet + lint + test
```
The agent sends heartbeats to the control plane every 30 seconds.
See `CLAUDE.md` for full architecture documentation.

View File

@ -1,14 +1,18 @@
package main
import (
"bufio"
"context"
"crypto/tls"
"flag"
"fmt"
"log/slog"
"net/http"
"os"
"os/signal"
"path/filepath"
"strconv"
"strings"
"sync"
"syscall"
"time"
@ -21,6 +25,7 @@ import (
"git.omukk.dev/wrenn/wrenn/internal/network"
"git.omukk.dev/wrenn/wrenn/internal/sandbox"
"git.omukk.dev/wrenn/wrenn/pkg/auth"
"git.omukk.dev/wrenn/wrenn/pkg/logging"
"git.omukk.dev/wrenn/wrenn/proto/hostagent/gen/hostagentv1connect"
)
@ -38,18 +43,24 @@ func main() {
advertiseAddr := flag.String("address", "", "Externally-reachable address (ip:port) for this host agent")
flag.Parse()
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
Level: slog.LevelDebug,
})))
rootDir := envOrDefault("WRENN_DIR", "/var/lib/wrenn")
cleanupLog := logging.Setup(filepath.Join(rootDir, "logs"), "host-agent")
defer cleanupLog()
if os.Geteuid() != 0 {
slog.Error("host agent must run as root")
if err := checkPrivileges(); err != nil {
slog.Error("insufficient privileges", "error", err)
os.Exit(1)
}
// Enable IP forwarding (required for NAT).
// Enable IP forwarding (required for NAT). The write may fail if running
// as non-root without DAC_OVERRIDE on this path — that's OK if the systemd
// unit's ExecStartPre already set it. We verify the value regardless.
if err := os.WriteFile("/proc/sys/net/ipv4/ip_forward", []byte("1"), 0644); err != nil {
slog.Warn("failed to enable ip_forward", "error", err)
slog.Warn("failed to enable ip_forward (may have been set by systemd unit)", "error", err)
}
if b, err := os.ReadFile("/proc/sys/net/ipv4/ip_forward"); err != nil || strings.TrimSpace(string(b)) != "1" {
slog.Error("ip_forward is not enabled — sandbox networking will be broken", "error", err)
os.Exit(1)
}
// Clean up stale resources from a previous crash.
@ -57,7 +68,6 @@ func main() {
network.CleanupStaleNamespaces()
listenAddr := envOrDefault("WRENN_HOST_LISTEN_ADDR", ":50051")
rootDir := envOrDefault("WRENN_DIR", "/var/lib/wrenn")
cpURL := os.Getenv("WRENN_CP_URL")
credsFile := filepath.Join(rootDir, "host-credentials.json")
@ -170,6 +180,7 @@ func main() {
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
mgr.Shutdown(shutdownCtx)
sandbox.ShrinkMinimalImage(rootDir)
if err := httpServer.Shutdown(shutdownCtx); err != nil {
slog.Error("http server shutdown error", "error", err)
}
@ -245,3 +256,63 @@ func envOrDefault(key, def string) string {
}
return def
}
// checkPrivileges verifies the process has the required Linux capabilities.
// Always reads CapEff — even for root — because a root process inside a
// restricted container (e.g. docker --cap-drop=all) may not have all caps.
func checkPrivileges() error {
capEff, err := readEffectiveCaps()
if err != nil {
return fmt.Errorf("read capabilities: %w", err)
}
// All capabilities required by the host agent at runtime.
required := []struct {
bit uint
name string
}{
{1, "CAP_DAC_OVERRIDE"}, // /dev/loop*, /dev/mapper/*, /dev/net/tun
{5, "CAP_KILL"}, // SIGTERM/SIGKILL to Firecracker processes
{12, "CAP_NET_ADMIN"}, // netlink, iptables, routing, TAP/veth
{13, "CAP_NET_RAW"}, // raw sockets (iptables)
{19, "CAP_SYS_PTRACE"}, // reading /proc/self/ns/net (netns.Get)
{21, "CAP_SYS_ADMIN"}, // netns, mount ns, losetup, dmsetup
{27, "CAP_MKNOD"}, // device-mapper node creation
}
var missing []string
for _, cap := range required {
if capEff&(1<<cap.bit) == 0 {
missing = append(missing, cap.name)
}
}
if len(missing) > 0 {
return fmt.Errorf("missing capabilities: %s — run as root or apply setcap to the binary",
strings.Join(missing, ", "))
}
return nil
}
// readEffectiveCaps parses the CapEff bitmask from /proc/self/status.
func readEffectiveCaps() (uint64, error) {
f, err := os.Open("/proc/self/status")
if err != nil {
return 0, err
}
defer f.Close()
scanner := bufio.NewScanner(f)
for scanner.Scan() {
line := scanner.Text()
if hexStr, ok := strings.CutPrefix(line, "CapEff:"); ok {
return strconv.ParseUint(strings.TrimSpace(hexStr), 16, 64)
}
}
if err := scanner.Err(); err != nil {
return 0, fmt.Errorf("read /proc/self/status: %w", err)
}
return 0, fmt.Errorf("CapEff not found in /proc/self/status")
}

19
deploy/logrotate/wrenn Normal file
View File

@ -0,0 +1,19 @@
/var/lib/wrenn/logs/control-plane.log
/var/lib/wrenn/logs/host-agent.log
{
daily
rotate 3
missingok
notifempty
dateext
dateformat -%Y-%m-%d
compress
delaycompress
sharedscripts
postrotate
# Signal the processes to reopen their log files.
# Use SIGHUP — both binaries handle it gracefully.
pkill -HUP -f wrenn-cp || true
pkill -HUP -f wrenn-agent || true
endscript
}

View File

@ -0,0 +1 @@
export const prerender = false;

View File

@ -28,6 +28,7 @@ var openapiYAML []byte
type Server struct {
router chi.Router
BuildSvc *service.BuildService
version string
}
// New constructs the chi router and registers all routes.
@ -48,6 +49,7 @@ func New(
mailer email.Mailer,
extensions []cpextension.Extension,
sctx cpextension.ServerContext,
version string,
) *Server {
r := chi.NewRouter()
r.Use(requestLogger())
@ -86,6 +88,12 @@ func New(
adminCapsules := newAdminCapsuleHandler(sandboxSvc, queries, pool, al)
meH := newMeHandler(queries, pgPool, rdb, jwtSecret, mailer, oauthRegistry, oauthRedirectURL, teamSvc)
// Health check.
r.Get("/health", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
fmt.Fprintf(w, `{"status":"ok","version":%q}`, version)
})
// OpenAPI spec and docs.
r.Get("/openapi.yaml", serveOpenAPI)
r.Get("/docs", serveDocs)
@ -270,7 +278,7 @@ func New(
ext.RegisterRoutes(r, sctx)
}
return &Server{router: r, BuildSvc: buildSvc}
return &Server{router: r, BuildSvc: buildSvc, version: version}
}
// Handler returns the HTTP handler.

View File

@ -24,7 +24,7 @@ func (a *SlotAllocator) Allocate() (int, error) {
a.mu.Lock()
defer a.mu.Unlock()
for i := 1; i <= 65534; i++ {
for i := 1; i <= 32767; i++ {
if !a.inUse[i] {
a.inUse[i] = true
return i, nil

View File

@ -104,6 +104,37 @@ func ParseSizeToMB(s string) (int, error) {
}
}
// ShrinkMinimalImage shrinks the built-in minimal rootfs back to its minimum
// size using resize2fs -M. This is the inverse of EnsureImageSizes and should
// be called during graceful shutdown so the image is stored compactly on disk.
func ShrinkMinimalImage(wrennDir string) {
minimalRootfs := layout.TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
shrinkImage(minimalRootfs)
}
// shrinkImage shrinks a single rootfs image to its minimum size.
func shrinkImage(rootfs string) {
if _, err := os.Stat(rootfs); err != nil {
return
}
slog.Info("shrinking base image", "path", rootfs)
if out, err := exec.Command("e2fsck", "-fy", rootfs).CombinedOutput(); err != nil {
if exitErr, ok := err.(*exec.ExitError); ok && exitErr.ExitCode() > 1 {
slog.Warn("e2fsck before shrink failed", "path", rootfs, "output", string(out), "error", err)
return
}
}
if out, err := exec.Command("resize2fs", "-M", rootfs).CombinedOutput(); err != nil {
slog.Warn("resize2fs -M failed", "path", rootfs, "output", string(out), "error", err)
return
}
slog.Info("base image shrunk", "path", rootfs)
}
// expandImage expands a single rootfs image if it is smaller than targetBytes.
func expandImage(rootfs string, targetBytes int64, targetMB int) error {
info, err := os.Stat(rootfs)

View File

@ -14,6 +14,7 @@ type Config struct {
RedisURL string
ListenAddr string
JWTSecret string
WrennDir string // WRENN_DIR — base directory for wrenn data (logs, etc.)
// mTLS — CP→Agent channel. Both must be set to enable mTLS; omitting either
// disables cert issuance and leaves agent connections on plain HTTP (dev mode).
@ -48,6 +49,7 @@ func Load() Config {
RedisURL: envOrDefault("REDIS_URL", "redis://localhost:6379/0"),
ListenAddr: envOrDefault("WRENN_CP_LISTEN_ADDR", ":8080"),
JWTSecret: os.Getenv("JWT_SECRET"),
WrennDir: envOrDefault("WRENN_DIR", "/var/lib/wrenn"),
CACert: os.Getenv("WRENN_CA_CERT"),
CAKey: os.Getenv("WRENN_CA_KEY"),

View File

@ -6,6 +6,7 @@ import (
"net/http"
"os"
"os/signal"
"path/filepath"
"strings"
"syscall"
"time"
@ -22,6 +23,7 @@ import (
"git.omukk.dev/wrenn/wrenn/pkg/config"
"git.omukk.dev/wrenn/wrenn/pkg/db"
"git.omukk.dev/wrenn/wrenn/pkg/lifecycle"
"git.omukk.dev/wrenn/wrenn/pkg/logging"
"git.omukk.dev/wrenn/wrenn/pkg/scheduler"
)
@ -39,11 +41,9 @@ func Run(opts ...Option) {
opt(o)
}
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
Level: slog.LevelDebug,
})))
cfg := config.Load()
cleanupLog := logging.Setup(filepath.Join(cfg.WrennDir, "logs"), "control-plane")
defer cleanupLog()
if len(cfg.JWTSecret) < 32 {
slog.Error("JWT_SECRET must be at least 32 characters")
@ -175,7 +175,7 @@ func Run(opts ...Option) {
}
// API server.
srv := api.New(queries, hostPool, hostScheduler, pool, rdb, []byte(cfg.JWTSecret), oauthRegistry, cfg.OAuthRedirectURL, ca, al, channelSvc, mailer, o.extensions, sctx)
srv := api.New(queries, hostPool, hostScheduler, pool, rdb, []byte(cfg.JWTSecret), oauthRegistry, cfg.OAuthRedirectURL, ca, al, channelSvc, mailer, o.extensions, sctx, o.version)
// Start template build workers (2 concurrent).
stopBuildWorkers := srv.BuildSvc.StartWorkers(ctx, 2)

135
pkg/logging/logging.go Normal file
View File

@ -0,0 +1,135 @@
package logging
import (
"io"
"log/slog"
"os"
"os/signal"
"path/filepath"
"strings"
"sync"
"syscall"
)
// Setup configures the global slog logger with dual output (stderr + rotating
// log file). logsDir is the directory where log files are written. binaryName
// is used as the log filename (e.g. "control-plane" → "control-plane.log").
//
// If logsDir is empty or the directory cannot be created, Setup falls back to
// stderr-only logging and returns a no-op cleanup function.
//
// The returned cleanup function closes the log file and must be deferred.
// Setup also installs a SIGHUP handler that reopens the log file, allowing
// external log rotation tools (e.g. logrotate) to rotate files in place.
func Setup(logsDir, binaryName string) func() {
level := parseLevel(os.Getenv("LOG_LEVEL"))
if logsDir == "" {
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
Level: level,
})))
return func() {}
}
if err := os.MkdirAll(logsDir, 0750); err != nil {
// Fall back to stderr-only; log the error so operators notice.
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
Level: level,
})))
slog.Warn("file logging unavailable: failed to create log directory", "dir", logsDir, "error", err)
return func() {}
}
logPath := filepath.Join(logsDir, binaryName+".log")
rf, err := newReopenableFile(logPath)
if err != nil {
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
Level: level,
})))
slog.Warn("file logging unavailable: failed to open log file", "path", logPath, "error", err)
return func() {}
}
mw := io.MultiWriter(os.Stderr, rf)
slog.SetDefault(slog.New(slog.NewTextHandler(mw, &slog.HandlerOptions{
Level: level,
})))
// SIGHUP reopens the log file so logrotate can rotate in place.
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGHUP)
go func() {
for range sigCh {
if err := rf.Reopen(); err != nil {
slog.Error("failed to reopen log file on SIGHUP", "path", logPath, "error", err)
} else {
slog.Info("log file reopened", "path", logPath)
}
}
}()
return func() {
signal.Stop(sigCh)
close(sigCh)
rf.Close()
}
}
func parseLevel(s string) slog.Level {
switch strings.ToLower(strings.TrimSpace(s)) {
case "debug":
return slog.LevelDebug
case "warn", "warning":
return slog.LevelWarn
case "error":
return slog.LevelError
default:
return slog.LevelInfo
}
}
// reopenableFile is an io.Writer backed by an *os.File that can be atomically
// reopened (for log rotation via SIGHUP). All operations are goroutine-safe.
type reopenableFile struct {
path string
mu sync.Mutex
f *os.File
}
func newReopenableFile(path string) (*reopenableFile, error) {
f, err := os.OpenFile(path, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0640)
if err != nil {
return nil, err
}
return &reopenableFile{path: path, f: f}, nil
}
func (r *reopenableFile) Write(p []byte) (int, error) {
r.mu.Lock()
defer r.mu.Unlock()
return r.f.Write(p)
}
// Reopen closes the current file and opens a new one at the same path.
// This is the mechanism that makes logrotate's copytruncate-free rotation work:
// logrotate renames the old file, then sends SIGHUP, and the process opens a
// fresh file at the original path.
func (r *reopenableFile) Reopen() error {
r.mu.Lock()
defer r.mu.Unlock()
// Open the new file before closing the old one so a failed open doesn't
// leave the writer in a broken state with a closed fd.
f, err := os.OpenFile(r.path, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0640)
if err != nil {
return err
}
r.f.Close()
r.f = f
return nil
}
func (r *reopenableFile) Close() error {
r.mu.Lock()
defer r.mu.Unlock()
return r.f.Close()
}

385
scripts/prepare-wrenn-user.sh Executable file
View File

@ -0,0 +1,385 @@
#!/usr/bin/env bash
#
# prepare-wrenn-user.sh — Create the wrenn system user and configure minimal privileges.
#
# Creates a locked-down 'wrenn' system user that can run wrenn-agent and wrenn-cp
# with only the privileges they need. The agent binary gets Linux capabilities
# via setcap — no sudo is configured for the wrenn user at all. If an attacker
# compromises the wrenn user, they cannot escalate via sudo.
#
# What this script does:
# 1. Creates the 'wrenn' system user (bash shell for debugging, no home dir)
# 2. Creates required directories with correct ownership
# 3. Sets Linux capabilities on wrenn-agent and all child binaries
# 4. Installs an apt hook to restore capabilities after package updates
# 5. Installs a sudoers drop-in (comment-only, no grants — absence is the cage)
# 6. Ensures required kernel modules are loaded
# 7. Writes systemd unit files for both wrenn-agent and wrenn-cp
#
# Usage:
# sudo bash scripts/prepare-wrenn-user.sh
#
# Prerequisites:
# - wrenn-agent binary at /usr/local/bin/wrenn-agent
# - wrenn-cp binary at /usr/local/bin/wrenn-cp
# - firecracker binary at /usr/local/bin/firecracker
# - libcap2-bin installed (for setcap)
set -euo pipefail
# ── Guard ────────────────────────────────────────────────────────────────────
if [[ $EUID -ne 0 ]]; then
echo "ERROR: This script must be run as root."
exit 1
fi
# ── Configuration ────────────────────────────────────────────────────────────
WRENN_USER="wrenn"
WRENN_GROUP="wrenn"
WRENN_DIR="/var/lib/wrenn"
AGENT_BIN="/usr/local/bin/wrenn-agent"
CP_BIN="/usr/local/bin/wrenn-cp"
FC_BIN="/usr/local/bin/firecracker"
RESTORE_CAPS_SCRIPT="/etc/wrenn/restore-caps.sh"
# ── 1. Create system user ───────────────────────────────────────────────────
if id "${WRENN_USER}" &>/dev/null; then
echo "==> User '${WRENN_USER}' already exists, skipping creation."
else
echo "==> Creating system user '${WRENN_USER}'..."
useradd \
--system \
--no-create-home \
--home-dir "${WRENN_DIR}" \
--shell /bin/bash \
"${WRENN_USER}"
fi
# Add wrenn to kvm group for /dev/kvm access.
if getent group kvm &>/dev/null; then
usermod -aG kvm "${WRENN_USER}"
echo "==> Added '${WRENN_USER}' to 'kvm' group."
fi
# ── 2. Create directories with correct ownership ────────────────────────────
echo "==> Setting up directories..."
directories=(
"${WRENN_DIR}"
"${WRENN_DIR}/images"
"${WRENN_DIR}/kernels"
"${WRENN_DIR}/sandboxes"
"${WRENN_DIR}/snapshots"
"${WRENN_DIR}/logs"
"/run/netns"
)
for dir in "${directories[@]}"; do
mkdir -p "${dir}"
done
# Only chown wrenn-owned dirs (not /run/netns which is system-managed).
for dir in "${WRENN_DIR}" "${WRENN_DIR}/images" "${WRENN_DIR}/kernels" \
"${WRENN_DIR}/sandboxes" "${WRENN_DIR}/snapshots" "${WRENN_DIR}/logs"; do
chown "${WRENN_USER}:${WRENN_GROUP}" "${dir}"
chmod 750 "${dir}"
done
# ── 3. Set capabilities on binaries ─────────────────────────────────────────
#
# These capabilities replace full root access. The wrenn-agent binary gets
# exactly the capabilities it needs for:
#
# CAP_SYS_ADMIN — network namespaces (netns create/enter), mount namespaces
# (unshare -m), losetup, dmsetup, mount/umount
# CAP_NET_ADMIN — veth/TAP creation (netlink), iptables rules, IP forwarding,
# routing table manipulation
# CAP_NET_RAW — raw socket access (needed by iptables internally)
# CAP_SYS_PTRACE — reading /proc/self/ns/net (netns.Get)
# CAP_KILL — sending SIGTERM/SIGKILL to Firecracker processes
# CAP_DAC_OVERRIDE — accessing /dev/loop*, /dev/mapper/*, /dev/net/tun,
# /proc/sys/net/ipv4/ip_forward
# CAP_MKNOD — creating device nodes (dm-snapshot)
#
# The 'ep' suffix means Effective + Permitted (granted at exec time).
echo "==> Setting capabilities on wrenn-agent..."
if [[ ! -f "${AGENT_BIN}" ]]; then
echo "WARNING: ${AGENT_BIN} not found, skipping setcap. Install the binary first."
else
setcap \
cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
"${AGENT_BIN}"
echo " Capabilities set on ${AGENT_BIN}:"
getcap "${AGENT_BIN}"
fi
# Firecracker also needs capabilities when spawned by a non-root parent.
# CAP_NET_ADMIN is required for network device access inside the netns.
if [[ -f "${FC_BIN}" ]]; then
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep "${FC_BIN}"
echo " Capabilities set on ${FC_BIN}:"
getcap "${FC_BIN}"
fi
# ── Helper: resolve binary path and apply setcap ────────────────────────────
#
# Uses `command -v` to find the binary in PATH (handles /usr/bin vs /usr/sbin
# differences across distros), then `readlink -f` to resolve symlinks so that
# setcap hits the real inode (important for iptables-nft/alternatives).
setcap_binary() {
local name="$1" caps="$2"
local bin
bin=$(command -v "$name" 2>/dev/null) || {
echo " WARNING: ${name} not found in PATH, skipping."
return 0
}
bin=$(readlink -f "$bin")
setcap "$caps" "$bin"
echo " $(getcap "$bin")"
}
# The child binaries invoked by wrenn-agent (iptables, losetup, dmsetup, etc.)
# also need capabilities since they'll be exec'd by a non-root user.
echo "==> Setting capabilities on child binaries..."
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
setcap_binary sysctl "cap_net_admin+ep"
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
setcap_binary dd "cap_dac_override+ep"
setcap_binary unshare "cap_sys_admin+ep"
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
# ── 4. Persist capabilities across package updates ──────────────────────────
#
# apt/dpkg overwrites binaries on package updates, which strips the xattr-based
# capabilities set by setcap. This installs:
# - /etc/wrenn/restore-caps.sh: re-applies setcap to all child binaries
# - /etc/apt/apt.conf.d/99-wrenn-setcap: apt post-invoke hook that calls it
echo "==> Installing capability restore hook..."
mkdir -p /etc/wrenn
cat > "${RESTORE_CAPS_SCRIPT}" << 'RESTORE'
#!/usr/bin/env bash
#
# restore-caps.sh — Re-apply Linux capabilities to wrenn child binaries.
# Called automatically by apt after package updates (see /etc/apt/apt.conf.d/99-wrenn-setcap).
# Can also be run manually: sudo /etc/wrenn/restore-caps.sh
set -euo pipefail
setcap_binary() {
local name="$1" caps="$2"
local bin
bin=$(command -v "$name" 2>/dev/null) || return 0
bin=$(readlink -f "$bin")
setcap "$caps" "$bin" 2>/dev/null || true
}
# wrenn-agent and firecracker (only if present — they aren't package-managed).
[[ -f /usr/local/bin/wrenn-agent ]] && \
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
/usr/local/bin/wrenn-agent 2>/dev/null || true
[[ -f /usr/local/bin/firecracker ]] && \
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep \
/usr/local/bin/firecracker 2>/dev/null || true
# Child binaries (these are the ones wiped by apt).
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
setcap_binary sysctl "cap_net_admin+ep"
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
setcap_binary dd "cap_dac_override+ep"
setcap_binary unshare "cap_sys_admin+ep"
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
RESTORE
chmod 755 "${RESTORE_CAPS_SCRIPT}"
cat > /etc/apt/apt.conf.d/99-wrenn-setcap << 'APT'
// Re-apply Linux capabilities to wrenn child binaries after any package update.
// Capabilities (xattr) are stripped when dpkg overwrites a binary.
DPkg::Post-Invoke { "/etc/wrenn/restore-caps.sh"; };
APT
echo " Installed ${RESTORE_CAPS_SCRIPT} and apt post-invoke hook."
# ── 5. Device access ────────────────────────────────────────────────────────
#
# /dev/kvm — handled by kvm group membership above
# /dev/net/tun — needs to be accessible by wrenn user
echo "==> Configuring device access..."
# Ensure /dev/net/tun is accessible (udev rule for persistence across reboots).
cat > /etc/udev/rules.d/99-wrenn.rules << 'UDEV'
# Allow wrenn user access to TUN device for TAP networking.
SUBSYSTEM=="misc", KERNEL=="tun", GROUP="wrenn", MODE="0660"
UDEV
udevadm control --reload-rules 2>/dev/null || true
echo " Installed udev rule for /dev/net/tun."
# ── 6. Kernel modules ───────────────────────────────────────────────────────
echo "==> Ensuring kernel modules are loaded..."
modules=(dm_snapshot dm_mod loop tun)
for mod in "${modules[@]}"; do
if ! lsmod | grep -q "^${mod}"; then
modprobe "${mod}" 2>/dev/null && echo " Loaded ${mod}" || echo " WARNING: Could not load ${mod}"
else
echo " ${mod} already loaded."
fi
done
# Persist across reboots.
for mod in "${modules[@]}"; do
grep -qxF "${mod}" /etc/modules-load.d/wrenn.conf 2>/dev/null || echo "${mod}" >> /etc/modules-load.d/wrenn.conf
done
echo " Module persistence written to /etc/modules-load.d/wrenn.conf."
# ── 7. Sudoers ──────────────────────────────────────────────────────────────
#
# The wrenn user has no sudo grants. The absence of a grant is the cage — an
# explicit "!ALL" deny is weaker due to known bypasses (CVE-2019-14287).
# This file exists purely as documentation for operators running `sudo -l`.
echo "==> Writing sudoers drop-in..."
cat > /etc/sudoers.d/wrenn << 'SUDOERS'
# Wrenn system user — no sudo access permitted.
# All privilege is granted via Linux capabilities on specific binaries (setcap).
# This file contains no active rules. The absence of any grant is intentional
# and is the strongest way to deny escalation.
#
# Do not add rules here. If the wrenn user needs new privileges, use setcap
# on the specific binary instead.
SUDOERS
chmod 440 /etc/sudoers.d/wrenn
visudo -c -f /etc/sudoers.d/wrenn
echo " /etc/sudoers.d/wrenn installed and validated."
# ── 8. Systemd units ────────────────────────────────────────────────────────
echo "==> Writing systemd service files..."
cat > /etc/systemd/system/wrenn-agent.service << 'UNIT'
[Unit]
Description=Wrenn Host Agent
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=wrenn
Group=wrenn
EnvironmentFile=-/etc/wrenn/agent.env
# The binary has capabilities set via setcap. These systemd directives ensure
# the capabilities are inherited into the process at exec time.
AmbientCapabilities=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
# IMPORTANT: must be false — child binaries (iptables, losetup, dmsetup, etc.)
# have their own file capabilities via setcap which must be honored at exec time.
NoNewPrivileges=false
# Enable IP forwarding before the agent starts. The "+" prefix runs this
# directive as root (bypassing User=wrenn) so it can write to procfs.
ExecStartPre=+/bin/sh -c 'sysctl -w net.ipv4.ip_forward=1'
ExecStart=/usr/local/bin/wrenn-agent --address ${WRENN_ADVERTISE_ADDR}
Restart=on-failure
RestartSec=5
# File descriptor limits (Firecracker + loop devices + sockets).
LimitNOFILE=65536
LimitNPROC=4096
# Protect host filesystem — only allow access to what's needed.
ProtectHome=true
ReadWritePaths=/var/lib/wrenn /tmp /run/netns /dev/mapper
ReadOnlyPaths=/usr/local/bin/firecracker
[Install]
WantedBy=multi-user.target
UNIT
cat > /etc/systemd/system/wrenn-cp.service << 'UNIT'
[Unit]
Description=Wrenn Control Plane
After=network-online.target postgresql.service
Wants=network-online.target
[Service]
Type=simple
User=wrenn
Group=wrenn
EnvironmentFile=-/etc/wrenn/cp.env
# Control plane is fully unprivileged — no capabilities needed.
NoNewPrivileges=true
CapabilityBoundingSet=
ExecStart=/usr/local/bin/wrenn-cp
Restart=on-failure
RestartSec=5
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/tmp
[Install]
WantedBy=multi-user.target
UNIT
mkdir -p /etc/wrenn
touch /etc/wrenn/agent.env /etc/wrenn/cp.env
chmod 640 /etc/wrenn/agent.env /etc/wrenn/cp.env
chown root:${WRENN_GROUP} /etc/wrenn/agent.env /etc/wrenn/cp.env
systemctl daemon-reload
echo " wrenn-agent.service and wrenn-cp.service installed."
# ── Done ─────────────────────────────────────────────────────────────────────
echo ""
echo "=== Setup complete ==="
echo ""
echo "Next steps:"
echo " 1. Copy wrenn-agent and wrenn-cp binaries to /usr/local/bin/"
echo " 2. Edit /etc/wrenn/agent.env with WRENN_CP_URL and WRENN_ADVERTISE_ADDR"
echo " 3. Edit /etc/wrenn/cp.env with DATABASE_URL and other control plane config"
echo " 4. systemctl enable --now wrenn-agent"
echo " 5. systemctl enable --now wrenn-cp"
echo ""
echo "Security summary:"
echo " - wrenn user: bash shell (for debugging), no home, no sudo (no grants in sudoers)"
echo " - wrenn-agent: runs as wrenn with 7 capabilities via setcap (not root)"
echo " - wrenn-cp: runs as wrenn with zero capabilities"
echo " - Capabilities auto-restored after apt upgrades via /etc/wrenn/restore-caps.sh"
echo ""