forked from wrenn/wrenn
Merge pull request 'Improved codebase to prepare for production' (#32) from chore/hardening into dev
Reviewed-on: wrenn/wrenn#32
This commit is contained in:
@ -1,3 +1,7 @@
|
||||
# Shared (applies to both control plane and host agent)
|
||||
WRENN_DIR=/var/lib/wrenn
|
||||
LOG_LEVEL=info
|
||||
|
||||
# Database
|
||||
DATABASE_URL=postgres://wrenn:wrenn@localhost:5432/wrenn?sslmode=disable
|
||||
|
||||
@ -9,7 +13,6 @@ WRENN_CP_LISTEN_ADDR=:9725
|
||||
|
||||
# Host Agent
|
||||
WRENN_HOST_LISTEN_ADDR=:50051
|
||||
WRENN_DIR=/var/lib/wrenn
|
||||
WRENN_HOST_INTERFACE=eth0
|
||||
WRENN_CP_URL=http://localhost:9725
|
||||
WRENN_DEFAULT_ROOTFS_SIZE=5Gi
|
||||
|
||||
20
CLAUDE.md
20
CLAUDE.md
@ -12,10 +12,10 @@ All commands go through the Makefile. Never use raw `go build` or `go run`.
|
||||
|
||||
```bash
|
||||
make build # Build all binaries → builds/
|
||||
make build-cp # Control plane only (builds frontend first)
|
||||
make build-cp # Control plane only
|
||||
make build-agent # Host agent only
|
||||
make build-envd # envd static binary (verified statically linked)
|
||||
make build-frontend # SvelteKit dashboard → internal/dashboard/static/
|
||||
make build-frontend # SvelteKit dashboard → frontend/build/ (served by Caddy)
|
||||
|
||||
make dev # Full local dev: infra + migrate + control plane
|
||||
make dev-infra # Start PostgreSQL + Prometheus + Grafana (Docker)
|
||||
@ -55,7 +55,7 @@ User SDK → HTTPS/WS → Control Plane → Connect RPC → Host Agent → HTTP/
|
||||
| Binary | Module | Entry point | Runs as |
|
||||
|--------|--------|-------------|---------|
|
||||
| wrenn-cp | `git.omukk.dev/wrenn/wrenn` | `cmd/control-plane/main.go` | Unprivileged |
|
||||
| wrenn-agent | `git.omukk.dev/wrenn/wrenn` | `cmd/host-agent/main.go` | Root (NET_ADMIN + /dev/kvm) |
|
||||
| wrenn-agent | `git.omukk.dev/wrenn/wrenn` | `cmd/host-agent/main.go` | `wrenn` user with capabilities (SYS_ADMIN, NET_ADMIN, NET_RAW, SYS_PTRACE, KILL, DAC_OVERRIDE, MKNOD) via setcap; also accepts root |
|
||||
| envd | `git.omukk.dev/wrenn/wrenn/envd` (standalone `envd/go.mod`) | `envd/main.go` | PID 1 inside guest VM |
|
||||
|
||||
envd is a **completely independent Go module**. It is never imported by the main module. The only connection is the protobuf contract. It compiles to a static binary baked into rootfs images.
|
||||
@ -64,7 +64,7 @@ envd is a **completely independent Go module**. It is never imported by the main
|
||||
|
||||
### Control Plane
|
||||
|
||||
**Internal packages:** `internal/api/`, `internal/dashboard/`, `internal/email/`
|
||||
**Internal packages:** `internal/api/`, `internal/email/`
|
||||
|
||||
**Public packages (importable by cloud repo):** `pkg/config/`, `pkg/db/`, `pkg/auth/`, `pkg/auth/oauth/`, `pkg/scheduler/`, `pkg/lifecycle/`, `pkg/channels/`, `pkg/audit/`, `pkg/service/`, `pkg/events/`, `pkg/id/`, `pkg/validate/`
|
||||
|
||||
@ -78,7 +78,7 @@ Startup (`cmd/control-plane/main.go`) is a thin wrapper: `cpserver.Run(cpserver.
|
||||
|
||||
- **API Server** (`internal/api/server.go`): chi router with middleware. Creates handler structs (`sandboxHandler`, `execHandler`, `filesHandler`, etc.) injected with `db.Queries` and the host agent Connect RPC client. Routes under `/v1/capsules/*`. Accepts `[]cpextension.Extension` — each extension's `RegisterRoutes()` is called after all core routes are registered.
|
||||
- **Reconciler** (`internal/api/reconciler.go`): background goroutine (every 30s) that compares DB records against `agent.ListSandboxes()` RPC. Marks orphaned DB entries as "stopped".
|
||||
- **Dashboard** (SvelteKit + Tailwind + Bits UI, statically built and embedded via `go:embed`, served as catch-all at root)
|
||||
- **Dashboard** (SvelteKit + Tailwind + Bits UI, built to static files in `frontend/build/`, served by Caddy as a reverse proxy)
|
||||
- **Database**: PostgreSQL via pgx/v5. Queries generated by sqlc from `db/queries/*.sql` → `pkg/db/`. Migrations in `db/migrations/` (goose, plain SQL). `db/migrations/embed.go` exposes `migrations.FS` so the cloud repo can run OSS migrations via `go:embed`.
|
||||
- **Config** (`pkg/config/config.go`): purely environment variables (`DATABASE_URL`, `CP_LISTEN_ADDR`, `CP_HOST_AGENT_ADDR`), no YAML/file config.
|
||||
|
||||
@ -86,7 +86,9 @@ Startup (`cmd/control-plane/main.go`) is a thin wrapper: `cpserver.Run(cpserver.
|
||||
|
||||
**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/devicemapper/`, `internal/envdclient/`, `internal/snapshot/`
|
||||
|
||||
Startup (`cmd/host-agent/main.go`) wires: root check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
|
||||
**Production deployment:** `scripts/prepare-wrenn-user.sh` creates the `wrenn` system user, sets Linux capabilities (setcap) on wrenn-agent and all child binaries (iptables, losetup, dmsetup, etc.), installs an apt hook to restore capabilities after package updates, configures udev rules for `/dev/net/tun`, loads required kernel modules, and writes systemd unit files for both services. No sudo grants — all privilege is via capabilities.
|
||||
|
||||
Startup (`cmd/host-agent/main.go`) wires: root/capabilities check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
|
||||
|
||||
- **RPC Server** (`internal/hostagent/server.go`): implements `hostagentv1connect.HostAgentServiceHandler`. Thin wrapper — every method delegates to `sandbox.Manager`. Maps Connect error codes on return.
|
||||
- **Sandbox Manager** (`internal/sandbox/manager.go`): the core orchestration layer. Maintains in-memory state in `boxes map[string]*sandboxState` (protected by `sync.RWMutex`). Each `sandboxState` holds a `models.Sandbox`, a `*network.Slot`, and an `*envdclient.Client`. Runs a TTL reaper (every 10s) that auto-destroys timed-out sandboxes.
|
||||
@ -113,8 +115,8 @@ Runs as PID 1 inside the microVM via `wrenn-init.sh` (mounts procfs/sysfs/dev, s
|
||||
- **Package manager**: pnpm
|
||||
- **Routing**: SvelteKit file-based routing under `frontend/src/routes/`
|
||||
- **Routing layout**: `/login` and `/signup` at root, authenticated pages under `/dashboard/*` (e.g. `/dashboard/capsules`, `/dashboard/keys`)
|
||||
- **Build output**: `frontend/build/` → copied to `internal/dashboard/static/` → embedded via `go:embed` into the control plane binary
|
||||
- **Serving**: `internal/dashboard/dashboard.go` registers a `NotFound` catch-all SPA handler with fallback to `index.html`. API routes (`/v1/*`, `/openapi.yaml`, `/docs`) are registered first and take priority
|
||||
- **Build output**: `frontend/build/` — static files served by Caddy
|
||||
- **Serving**: Caddy reverse-proxies API requests to the control plane and serves the SvelteKit SPA directly. The control plane does not serve frontend assets.
|
||||
- **Dev workflow**: `make dev-frontend` runs Vite dev server on port 5173 with HMR. API calls proxy to `http://localhost:8000`
|
||||
- **Fonts**: Manrope (UI), Instrument Serif (headings), JetBrains Mono (code), Alice (brand wordmark) — all self-hosted via `@fontsource`
|
||||
- **Dark mode**: class-based (`.dark` on `<html>`) with system preference detection + localStorage persistence
|
||||
@ -209,7 +211,7 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
|
||||
- **TAP networking** (not vsock) for host-to-envd communication
|
||||
- **Device-mapper snapshots** for rootfs CoW — shared read-only loop device per base template, per-sandbox sparse CoW file, Firecracker gets `/dev/mapper/wrenn-{id}`
|
||||
- **PostgreSQL** via pgx/v5 + sqlc (type-safe query generation). Goose for migrations (plain SQL, up/down)
|
||||
- **Dashboard**: SvelteKit (Svelte 5, adapter-static) + Tailwind CSS v4 + Bits UI. Built to static files, embedded into the Go binary via `go:embed`, served as catch-all at root
|
||||
- **Dashboard**: SvelteKit (Svelte 5, adapter-static) + Tailwind CSS v4 + Bits UI. Built to static files in `frontend/build/`, served by Caddy (not embedded in the Go binary)
|
||||
- **Lago** for billing (external service, not in this codebase)
|
||||
|
||||
## Coding Conventions
|
||||
|
||||
133
README.md
133
README.md
@ -2,16 +2,16 @@
|
||||
|
||||
Secure infrastructure for AI
|
||||
|
||||
## Deployment
|
||||
|
||||
### Prerequisites
|
||||
## Prerequisites
|
||||
|
||||
- Linux host with `/dev/kvm` access (bare metal or nested virt)
|
||||
- Firecracker binary at `/usr/local/bin/firecracker`
|
||||
- PostgreSQL
|
||||
- Go 1.25+
|
||||
- pnpm (for frontend)
|
||||
- Docker (for dev infra and rootfs builds)
|
||||
|
||||
### Build
|
||||
## Build
|
||||
|
||||
```bash
|
||||
make build # outputs to builds/
|
||||
@ -19,30 +19,77 @@ make build # outputs to builds/
|
||||
|
||||
Produces three binaries: `wrenn-cp` (control plane), `wrenn-agent` (host agent), `envd` (guest agent).
|
||||
|
||||
### Host setup
|
||||
## Host setup
|
||||
|
||||
The host agent machine needs:
|
||||
The host agent needs a kernel, a minimal rootfs image, and working directories on the host machine.
|
||||
|
||||
```bash
|
||||
# Kernel for guest VMs
|
||||
mkdir -p /var/lib/wrenn/kernels
|
||||
# Place a vmlinux kernel at /var/lib/wrenn/kernels/vmlinux
|
||||
### Directory structure
|
||||
|
||||
# Rootfs images
|
||||
mkdir -p /var/lib/wrenn/images
|
||||
# Build or place .ext4 rootfs images (e.g., minimal.ext4)
|
||||
|
||||
# Sandbox working directory
|
||||
mkdir -p /var/lib/wrenn/sandboxes
|
||||
|
||||
# Snapshots directory
|
||||
mkdir -p /var/lib/wrenn/snapshots
|
||||
|
||||
# Enable IP forwarding
|
||||
sysctl -w net.ipv4.ip_forward=1
|
||||
```
|
||||
/var/lib/wrenn/
|
||||
├── kernels/
|
||||
│ └── vmlinux # uncompressed Linux kernel (not bzImage)
|
||||
├── images/
|
||||
│ └── minimal/
|
||||
│ └── rootfs.ext4 # base rootfs (all other templates snapshot from this)
|
||||
├── sandboxes/ # per-sandbox CoW files (created at runtime)
|
||||
└── snapshots/ # pause/hibernate snapshot files (created at runtime)
|
||||
```
|
||||
|
||||
### Configure
|
||||
Create the directories:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /var/lib/wrenn/{kernels,images/minimal,sandboxes,snapshots}
|
||||
```
|
||||
|
||||
### Kernel
|
||||
|
||||
Place an uncompressed `vmlinux` kernel at `/var/lib/wrenn/kernels/vmlinux`. Versioned kernels (`vmlinux-{semver}`) are also supported — the agent picks the latest by semver.
|
||||
|
||||
### Minimal rootfs
|
||||
|
||||
The minimal rootfs is the base image that all other templates (Python, Node, etc.) are built on top of via device-mapper snapshots. It must contain:
|
||||
|
||||
| Package | Why |
|
||||
|---------|-----|
|
||||
| `socat` | Bidirectional relay for port forwarding |
|
||||
| `chrony` | Time sync from KVM PTP clock (`/dev/ptp0`) |
|
||||
| `tini` | PID 1 zombie reaper (injected by build script, not apt) |
|
||||
| `sudo` | User privilege management inside the guest |
|
||||
| `wget` | HTTP fetching |
|
||||
| `curl` | HTTP client |
|
||||
| `ca-certificates` | TLS certificate verification |
|
||||
|
||||
**To build a rootfs from a Docker container:**
|
||||
|
||||
1. Create and configure a container with the required packages:
|
||||
```bash
|
||||
docker run -it --name wrenn-minimal debian:bookworm bash
|
||||
# Inside the container:
|
||||
apt update && apt install -y socat chrony sudo wget curl ca-certificates
|
||||
exit
|
||||
```
|
||||
|
||||
2. Export to a rootfs image (builds envd, injects wrenn-init + tini, shrinks to minimum size):
|
||||
```bash
|
||||
sudo bash scripts/rootfs-from-container.sh wrenn-minimal minimal
|
||||
```
|
||||
|
||||
**To update an existing rootfs** after changing envd or `wrenn-init.sh`:
|
||||
|
||||
```bash
|
||||
bash scripts/update-minimal-rootfs.sh
|
||||
```
|
||||
|
||||
This rebuilds envd via `make build-envd` and copies the fresh binaries into the mounted rootfs image.
|
||||
|
||||
### IP forwarding
|
||||
|
||||
```bash
|
||||
sudo sysctl -w net.ipv4.ip_forward=1
|
||||
```
|
||||
|
||||
## Configure
|
||||
|
||||
Copy `.env.example` to `.env` and edit:
|
||||
|
||||
@ -59,25 +106,21 @@ WRENN_HOST_LISTEN_ADDR=:50051
|
||||
WRENN_DIR=/var/lib/wrenn
|
||||
```
|
||||
|
||||
### Run
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Apply database migrations
|
||||
make migrate-up
|
||||
|
||||
# Start control plane
|
||||
./builds/wrenn-cp
|
||||
make dev # Start PostgreSQL (Docker), run migrations, start control plane
|
||||
make dev-agent # Start host agent (separate terminal, sudo)
|
||||
make dev-frontend # Vite dev server with HMR (port 5173)
|
||||
make check # fmt + vet + lint + test
|
||||
```
|
||||
|
||||
Control plane listens on `WRENN_CP_LISTEN_ADDR` (default `:8000`).
|
||||
|
||||
### Host registration
|
||||
|
||||
Hosts must be registered with the control plane before they can serve sandboxes.
|
||||
|
||||
1. **Create a host record** (via API or dashboard):
|
||||
```bash
|
||||
# As an admin (JWT auth)
|
||||
curl -X POST http://localhost:8000/v1/hosts \
|
||||
-H "Authorization: Bearer $JWT_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
@ -87,17 +130,16 @@ Hosts must be registered with the control plane before they can serve sandboxes.
|
||||
|
||||
2. **Start the host agent** with the registration token and its externally-reachable address:
|
||||
```bash
|
||||
sudo WRENN_CP_URL=http://cp-host:8000 \
|
||||
sudo WRENN_CP_URL=http://localhost:8000 \
|
||||
./builds/wrenn-agent \
|
||||
--register <token-from-step-1> \
|
||||
--address 10.0.1.5:50051
|
||||
--address <host-ip>:50051
|
||||
```
|
||||
On first startup the agent sends its specs (arch, CPU, memory, disk) to the control plane, receives a long-lived host JWT, and saves it to `$WRENN_DIR/host-token`.
|
||||
|
||||
3. **Subsequent startups** don't need `--register` — the agent loads the saved JWT automatically:
|
||||
```bash
|
||||
sudo WRENN_CP_URL=http://cp-host:8000 \
|
||||
./builds/wrenn-agent --address 10.0.1.5:50051
|
||||
sudo ./builds/wrenn-agent --address <host-ip>:50051
|
||||
```
|
||||
|
||||
4. **If registration fails** (e.g., network error after token was consumed), regenerate a token:
|
||||
@ -107,23 +149,6 @@ Hosts must be registered with the control plane before they can serve sandboxes.
|
||||
```
|
||||
Then restart the agent with the new token.
|
||||
|
||||
The agent sends heartbeats to the control plane every 30 seconds. Host agent listens on `WRENN_HOST_LISTEN_ADDR` (default `:50051`).
|
||||
|
||||
### Rootfs images
|
||||
|
||||
envd must be baked into every rootfs image. After building:
|
||||
|
||||
```bash
|
||||
make build-envd
|
||||
bash scripts/update-debug-rootfs.sh /var/lib/wrenn/images/minimal.ext4
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
make dev # Start PostgreSQL (Docker), run migrations, start control plane
|
||||
make dev-agent # Start host agent (separate terminal, sudo)
|
||||
make check # fmt + vet + lint + test
|
||||
```
|
||||
The agent sends heartbeats to the control plane every 30 seconds.
|
||||
|
||||
See `CLAUDE.md` for full architecture documentation.
|
||||
|
||||
@ -1,14 +1,18 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"context"
|
||||
"crypto/tls"
|
||||
"flag"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"sync"
|
||||
"syscall"
|
||||
"time"
|
||||
@ -21,6 +25,7 @@ import (
|
||||
"git.omukk.dev/wrenn/wrenn/internal/network"
|
||||
"git.omukk.dev/wrenn/wrenn/internal/sandbox"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/auth"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/logging"
|
||||
"git.omukk.dev/wrenn/wrenn/proto/hostagent/gen/hostagentv1connect"
|
||||
)
|
||||
|
||||
@ -38,18 +43,24 @@ func main() {
|
||||
advertiseAddr := flag.String("address", "", "Externally-reachable address (ip:port) for this host agent")
|
||||
flag.Parse()
|
||||
|
||||
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
||||
Level: slog.LevelDebug,
|
||||
})))
|
||||
rootDir := envOrDefault("WRENN_DIR", "/var/lib/wrenn")
|
||||
cleanupLog := logging.Setup(filepath.Join(rootDir, "logs"), "host-agent")
|
||||
defer cleanupLog()
|
||||
|
||||
if os.Geteuid() != 0 {
|
||||
slog.Error("host agent must run as root")
|
||||
if err := checkPrivileges(); err != nil {
|
||||
slog.Error("insufficient privileges", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Enable IP forwarding (required for NAT).
|
||||
// Enable IP forwarding (required for NAT). The write may fail if running
|
||||
// as non-root without DAC_OVERRIDE on this path — that's OK if the systemd
|
||||
// unit's ExecStartPre already set it. We verify the value regardless.
|
||||
if err := os.WriteFile("/proc/sys/net/ipv4/ip_forward", []byte("1"), 0644); err != nil {
|
||||
slog.Warn("failed to enable ip_forward", "error", err)
|
||||
slog.Warn("failed to enable ip_forward (may have been set by systemd unit)", "error", err)
|
||||
}
|
||||
if b, err := os.ReadFile("/proc/sys/net/ipv4/ip_forward"); err != nil || strings.TrimSpace(string(b)) != "1" {
|
||||
slog.Error("ip_forward is not enabled — sandbox networking will be broken", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Clean up stale resources from a previous crash.
|
||||
@ -57,7 +68,6 @@ func main() {
|
||||
network.CleanupStaleNamespaces()
|
||||
|
||||
listenAddr := envOrDefault("WRENN_HOST_LISTEN_ADDR", ":50051")
|
||||
rootDir := envOrDefault("WRENN_DIR", "/var/lib/wrenn")
|
||||
cpURL := os.Getenv("WRENN_CP_URL")
|
||||
credsFile := filepath.Join(rootDir, "host-credentials.json")
|
||||
|
||||
@ -170,6 +180,7 @@ func main() {
|
||||
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||
defer shutdownCancel()
|
||||
mgr.Shutdown(shutdownCtx)
|
||||
sandbox.ShrinkMinimalImage(rootDir)
|
||||
if err := httpServer.Shutdown(shutdownCtx); err != nil {
|
||||
slog.Error("http server shutdown error", "error", err)
|
||||
}
|
||||
@ -245,3 +256,63 @@ func envOrDefault(key, def string) string {
|
||||
}
|
||||
return def
|
||||
}
|
||||
|
||||
// checkPrivileges verifies the process has the required Linux capabilities.
|
||||
// Always reads CapEff — even for root — because a root process inside a
|
||||
// restricted container (e.g. docker --cap-drop=all) may not have all caps.
|
||||
func checkPrivileges() error {
|
||||
capEff, err := readEffectiveCaps()
|
||||
if err != nil {
|
||||
return fmt.Errorf("read capabilities: %w", err)
|
||||
}
|
||||
|
||||
// All capabilities required by the host agent at runtime.
|
||||
required := []struct {
|
||||
bit uint
|
||||
name string
|
||||
}{
|
||||
{1, "CAP_DAC_OVERRIDE"}, // /dev/loop*, /dev/mapper/*, /dev/net/tun
|
||||
{5, "CAP_KILL"}, // SIGTERM/SIGKILL to Firecracker processes
|
||||
{12, "CAP_NET_ADMIN"}, // netlink, iptables, routing, TAP/veth
|
||||
{13, "CAP_NET_RAW"}, // raw sockets (iptables)
|
||||
{19, "CAP_SYS_PTRACE"}, // reading /proc/self/ns/net (netns.Get)
|
||||
{21, "CAP_SYS_ADMIN"}, // netns, mount ns, losetup, dmsetup
|
||||
{27, "CAP_MKNOD"}, // device-mapper node creation
|
||||
}
|
||||
|
||||
var missing []string
|
||||
for _, cap := range required {
|
||||
if capEff&(1<<cap.bit) == 0 {
|
||||
missing = append(missing, cap.name)
|
||||
}
|
||||
}
|
||||
|
||||
if len(missing) > 0 {
|
||||
return fmt.Errorf("missing capabilities: %s — run as root or apply setcap to the binary",
|
||||
strings.Join(missing, ", "))
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// readEffectiveCaps parses the CapEff bitmask from /proc/self/status.
|
||||
func readEffectiveCaps() (uint64, error) {
|
||||
f, err := os.Open("/proc/self/status")
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
defer f.Close()
|
||||
|
||||
scanner := bufio.NewScanner(f)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
if hexStr, ok := strings.CutPrefix(line, "CapEff:"); ok {
|
||||
return strconv.ParseUint(strings.TrimSpace(hexStr), 16, 64)
|
||||
}
|
||||
}
|
||||
|
||||
if err := scanner.Err(); err != nil {
|
||||
return 0, fmt.Errorf("read /proc/self/status: %w", err)
|
||||
}
|
||||
return 0, fmt.Errorf("CapEff not found in /proc/self/status")
|
||||
}
|
||||
|
||||
19
deploy/logrotate/wrenn
Normal file
19
deploy/logrotate/wrenn
Normal file
@ -0,0 +1,19 @@
|
||||
/var/lib/wrenn/logs/control-plane.log
|
||||
/var/lib/wrenn/logs/host-agent.log
|
||||
{
|
||||
daily
|
||||
rotate 3
|
||||
missingok
|
||||
notifempty
|
||||
dateext
|
||||
dateformat -%Y-%m-%d
|
||||
compress
|
||||
delaycompress
|
||||
sharedscripts
|
||||
postrotate
|
||||
# Signal the processes to reopen their log files.
|
||||
# Use SIGHUP — both binaries handle it gracefully.
|
||||
pkill -HUP -f wrenn-cp || true
|
||||
pkill -HUP -f wrenn-agent || true
|
||||
endscript
|
||||
}
|
||||
1
frontend/src/routes/admin/capsules/[id]/+page.js
Normal file
1
frontend/src/routes/admin/capsules/[id]/+page.js
Normal file
@ -0,0 +1 @@
|
||||
export const prerender = false;
|
||||
@ -28,6 +28,7 @@ var openapiYAML []byte
|
||||
type Server struct {
|
||||
router chi.Router
|
||||
BuildSvc *service.BuildService
|
||||
version string
|
||||
}
|
||||
|
||||
// New constructs the chi router and registers all routes.
|
||||
@ -48,6 +49,7 @@ func New(
|
||||
mailer email.Mailer,
|
||||
extensions []cpextension.Extension,
|
||||
sctx cpextension.ServerContext,
|
||||
version string,
|
||||
) *Server {
|
||||
r := chi.NewRouter()
|
||||
r.Use(requestLogger())
|
||||
@ -86,6 +88,12 @@ func New(
|
||||
adminCapsules := newAdminCapsuleHandler(sandboxSvc, queries, pool, al)
|
||||
meH := newMeHandler(queries, pgPool, rdb, jwtSecret, mailer, oauthRegistry, oauthRedirectURL, teamSvc)
|
||||
|
||||
// Health check.
|
||||
r.Get("/health", func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
fmt.Fprintf(w, `{"status":"ok","version":%q}`, version)
|
||||
})
|
||||
|
||||
// OpenAPI spec and docs.
|
||||
r.Get("/openapi.yaml", serveOpenAPI)
|
||||
r.Get("/docs", serveDocs)
|
||||
@ -270,7 +278,7 @@ func New(
|
||||
ext.RegisterRoutes(r, sctx)
|
||||
}
|
||||
|
||||
return &Server{router: r, BuildSvc: buildSvc}
|
||||
return &Server{router: r, BuildSvc: buildSvc, version: version}
|
||||
}
|
||||
|
||||
// Handler returns the HTTP handler.
|
||||
|
||||
@ -24,7 +24,7 @@ func (a *SlotAllocator) Allocate() (int, error) {
|
||||
a.mu.Lock()
|
||||
defer a.mu.Unlock()
|
||||
|
||||
for i := 1; i <= 65534; i++ {
|
||||
for i := 1; i <= 32767; i++ {
|
||||
if !a.inUse[i] {
|
||||
a.inUse[i] = true
|
||||
return i, nil
|
||||
|
||||
@ -104,6 +104,37 @@ func ParseSizeToMB(s string) (int, error) {
|
||||
}
|
||||
}
|
||||
|
||||
// ShrinkMinimalImage shrinks the built-in minimal rootfs back to its minimum
|
||||
// size using resize2fs -M. This is the inverse of EnsureImageSizes and should
|
||||
// be called during graceful shutdown so the image is stored compactly on disk.
|
||||
func ShrinkMinimalImage(wrennDir string) {
|
||||
minimalRootfs := layout.TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
|
||||
shrinkImage(minimalRootfs)
|
||||
}
|
||||
|
||||
// shrinkImage shrinks a single rootfs image to its minimum size.
|
||||
func shrinkImage(rootfs string) {
|
||||
if _, err := os.Stat(rootfs); err != nil {
|
||||
return
|
||||
}
|
||||
|
||||
slog.Info("shrinking base image", "path", rootfs)
|
||||
|
||||
if out, err := exec.Command("e2fsck", "-fy", rootfs).CombinedOutput(); err != nil {
|
||||
if exitErr, ok := err.(*exec.ExitError); ok && exitErr.ExitCode() > 1 {
|
||||
slog.Warn("e2fsck before shrink failed", "path", rootfs, "output", string(out), "error", err)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
if out, err := exec.Command("resize2fs", "-M", rootfs).CombinedOutput(); err != nil {
|
||||
slog.Warn("resize2fs -M failed", "path", rootfs, "output", string(out), "error", err)
|
||||
return
|
||||
}
|
||||
|
||||
slog.Info("base image shrunk", "path", rootfs)
|
||||
}
|
||||
|
||||
// expandImage expands a single rootfs image if it is smaller than targetBytes.
|
||||
func expandImage(rootfs string, targetBytes int64, targetMB int) error {
|
||||
info, err := os.Stat(rootfs)
|
||||
|
||||
@ -14,6 +14,7 @@ type Config struct {
|
||||
RedisURL string
|
||||
ListenAddr string
|
||||
JWTSecret string
|
||||
WrennDir string // WRENN_DIR — base directory for wrenn data (logs, etc.)
|
||||
|
||||
// mTLS — CP→Agent channel. Both must be set to enable mTLS; omitting either
|
||||
// disables cert issuance and leaves agent connections on plain HTTP (dev mode).
|
||||
@ -48,6 +49,7 @@ func Load() Config {
|
||||
RedisURL: envOrDefault("REDIS_URL", "redis://localhost:6379/0"),
|
||||
ListenAddr: envOrDefault("WRENN_CP_LISTEN_ADDR", ":8080"),
|
||||
JWTSecret: os.Getenv("JWT_SECRET"),
|
||||
WrennDir: envOrDefault("WRENN_DIR", "/var/lib/wrenn"),
|
||||
|
||||
CACert: os.Getenv("WRENN_CA_CERT"),
|
||||
CAKey: os.Getenv("WRENN_CA_KEY"),
|
||||
|
||||
@ -6,6 +6,7 @@ import (
|
||||
"net/http"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"syscall"
|
||||
"time"
|
||||
@ -22,6 +23,7 @@ import (
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/config"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/db"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/lifecycle"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/logging"
|
||||
"git.omukk.dev/wrenn/wrenn/pkg/scheduler"
|
||||
)
|
||||
|
||||
@ -39,11 +41,9 @@ func Run(opts ...Option) {
|
||||
opt(o)
|
||||
}
|
||||
|
||||
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
||||
Level: slog.LevelDebug,
|
||||
})))
|
||||
|
||||
cfg := config.Load()
|
||||
cleanupLog := logging.Setup(filepath.Join(cfg.WrennDir, "logs"), "control-plane")
|
||||
defer cleanupLog()
|
||||
|
||||
if len(cfg.JWTSecret) < 32 {
|
||||
slog.Error("JWT_SECRET must be at least 32 characters")
|
||||
@ -175,7 +175,7 @@ func Run(opts ...Option) {
|
||||
}
|
||||
|
||||
// API server.
|
||||
srv := api.New(queries, hostPool, hostScheduler, pool, rdb, []byte(cfg.JWTSecret), oauthRegistry, cfg.OAuthRedirectURL, ca, al, channelSvc, mailer, o.extensions, sctx)
|
||||
srv := api.New(queries, hostPool, hostScheduler, pool, rdb, []byte(cfg.JWTSecret), oauthRegistry, cfg.OAuthRedirectURL, ca, al, channelSvc, mailer, o.extensions, sctx, o.version)
|
||||
|
||||
// Start template build workers (2 concurrent).
|
||||
stopBuildWorkers := srv.BuildSvc.StartWorkers(ctx, 2)
|
||||
|
||||
135
pkg/logging/logging.go
Normal file
135
pkg/logging/logging.go
Normal file
@ -0,0 +1,135 @@
|
||||
package logging
|
||||
|
||||
import (
|
||||
"io"
|
||||
"log/slog"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"sync"
|
||||
"syscall"
|
||||
)
|
||||
|
||||
// Setup configures the global slog logger with dual output (stderr + rotating
|
||||
// log file). logsDir is the directory where log files are written. binaryName
|
||||
// is used as the log filename (e.g. "control-plane" → "control-plane.log").
|
||||
//
|
||||
// If logsDir is empty or the directory cannot be created, Setup falls back to
|
||||
// stderr-only logging and returns a no-op cleanup function.
|
||||
//
|
||||
// The returned cleanup function closes the log file and must be deferred.
|
||||
// Setup also installs a SIGHUP handler that reopens the log file, allowing
|
||||
// external log rotation tools (e.g. logrotate) to rotate files in place.
|
||||
func Setup(logsDir, binaryName string) func() {
|
||||
level := parseLevel(os.Getenv("LOG_LEVEL"))
|
||||
|
||||
if logsDir == "" {
|
||||
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
||||
Level: level,
|
||||
})))
|
||||
return func() {}
|
||||
}
|
||||
|
||||
if err := os.MkdirAll(logsDir, 0750); err != nil {
|
||||
// Fall back to stderr-only; log the error so operators notice.
|
||||
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
||||
Level: level,
|
||||
})))
|
||||
slog.Warn("file logging unavailable: failed to create log directory", "dir", logsDir, "error", err)
|
||||
return func() {}
|
||||
}
|
||||
|
||||
logPath := filepath.Join(logsDir, binaryName+".log")
|
||||
rf, err := newReopenableFile(logPath)
|
||||
if err != nil {
|
||||
slog.SetDefault(slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
||||
Level: level,
|
||||
})))
|
||||
slog.Warn("file logging unavailable: failed to open log file", "path", logPath, "error", err)
|
||||
return func() {}
|
||||
}
|
||||
|
||||
mw := io.MultiWriter(os.Stderr, rf)
|
||||
slog.SetDefault(slog.New(slog.NewTextHandler(mw, &slog.HandlerOptions{
|
||||
Level: level,
|
||||
})))
|
||||
|
||||
// SIGHUP reopens the log file so logrotate can rotate in place.
|
||||
sigCh := make(chan os.Signal, 1)
|
||||
signal.Notify(sigCh, syscall.SIGHUP)
|
||||
go func() {
|
||||
for range sigCh {
|
||||
if err := rf.Reopen(); err != nil {
|
||||
slog.Error("failed to reopen log file on SIGHUP", "path", logPath, "error", err)
|
||||
} else {
|
||||
slog.Info("log file reopened", "path", logPath)
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
return func() {
|
||||
signal.Stop(sigCh)
|
||||
close(sigCh)
|
||||
rf.Close()
|
||||
}
|
||||
}
|
||||
|
||||
func parseLevel(s string) slog.Level {
|
||||
switch strings.ToLower(strings.TrimSpace(s)) {
|
||||
case "debug":
|
||||
return slog.LevelDebug
|
||||
case "warn", "warning":
|
||||
return slog.LevelWarn
|
||||
case "error":
|
||||
return slog.LevelError
|
||||
default:
|
||||
return slog.LevelInfo
|
||||
}
|
||||
}
|
||||
|
||||
// reopenableFile is an io.Writer backed by an *os.File that can be atomically
|
||||
// reopened (for log rotation via SIGHUP). All operations are goroutine-safe.
|
||||
type reopenableFile struct {
|
||||
path string
|
||||
mu sync.Mutex
|
||||
f *os.File
|
||||
}
|
||||
|
||||
func newReopenableFile(path string) (*reopenableFile, error) {
|
||||
f, err := os.OpenFile(path, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0640)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return &reopenableFile{path: path, f: f}, nil
|
||||
}
|
||||
|
||||
func (r *reopenableFile) Write(p []byte) (int, error) {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
return r.f.Write(p)
|
||||
}
|
||||
|
||||
// Reopen closes the current file and opens a new one at the same path.
|
||||
// This is the mechanism that makes logrotate's copytruncate-free rotation work:
|
||||
// logrotate renames the old file, then sends SIGHUP, and the process opens a
|
||||
// fresh file at the original path.
|
||||
func (r *reopenableFile) Reopen() error {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
// Open the new file before closing the old one so a failed open doesn't
|
||||
// leave the writer in a broken state with a closed fd.
|
||||
f, err := os.OpenFile(r.path, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0640)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
r.f.Close()
|
||||
r.f = f
|
||||
return nil
|
||||
}
|
||||
|
||||
func (r *reopenableFile) Close() error {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
return r.f.Close()
|
||||
}
|
||||
385
scripts/prepare-wrenn-user.sh
Executable file
385
scripts/prepare-wrenn-user.sh
Executable file
@ -0,0 +1,385 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# prepare-wrenn-user.sh — Create the wrenn system user and configure minimal privileges.
|
||||
#
|
||||
# Creates a locked-down 'wrenn' system user that can run wrenn-agent and wrenn-cp
|
||||
# with only the privileges they need. The agent binary gets Linux capabilities
|
||||
# via setcap — no sudo is configured for the wrenn user at all. If an attacker
|
||||
# compromises the wrenn user, they cannot escalate via sudo.
|
||||
#
|
||||
# What this script does:
|
||||
# 1. Creates the 'wrenn' system user (bash shell for debugging, no home dir)
|
||||
# 2. Creates required directories with correct ownership
|
||||
# 3. Sets Linux capabilities on wrenn-agent and all child binaries
|
||||
# 4. Installs an apt hook to restore capabilities after package updates
|
||||
# 5. Installs a sudoers drop-in (comment-only, no grants — absence is the cage)
|
||||
# 6. Ensures required kernel modules are loaded
|
||||
# 7. Writes systemd unit files for both wrenn-agent and wrenn-cp
|
||||
#
|
||||
# Usage:
|
||||
# sudo bash scripts/prepare-wrenn-user.sh
|
||||
#
|
||||
# Prerequisites:
|
||||
# - wrenn-agent binary at /usr/local/bin/wrenn-agent
|
||||
# - wrenn-cp binary at /usr/local/bin/wrenn-cp
|
||||
# - firecracker binary at /usr/local/bin/firecracker
|
||||
# - libcap2-bin installed (for setcap)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ── Guard ────────────────────────────────────────────────────────────────────
|
||||
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
echo "ERROR: This script must be run as root."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ── Configuration ────────────────────────────────────────────────────────────
|
||||
|
||||
WRENN_USER="wrenn"
|
||||
WRENN_GROUP="wrenn"
|
||||
WRENN_DIR="/var/lib/wrenn"
|
||||
AGENT_BIN="/usr/local/bin/wrenn-agent"
|
||||
CP_BIN="/usr/local/bin/wrenn-cp"
|
||||
FC_BIN="/usr/local/bin/firecracker"
|
||||
RESTORE_CAPS_SCRIPT="/etc/wrenn/restore-caps.sh"
|
||||
|
||||
# ── 1. Create system user ───────────────────────────────────────────────────
|
||||
|
||||
if id "${WRENN_USER}" &>/dev/null; then
|
||||
echo "==> User '${WRENN_USER}' already exists, skipping creation."
|
||||
else
|
||||
echo "==> Creating system user '${WRENN_USER}'..."
|
||||
useradd \
|
||||
--system \
|
||||
--no-create-home \
|
||||
--home-dir "${WRENN_DIR}" \
|
||||
--shell /bin/bash \
|
||||
"${WRENN_USER}"
|
||||
fi
|
||||
|
||||
# Add wrenn to kvm group for /dev/kvm access.
|
||||
if getent group kvm &>/dev/null; then
|
||||
usermod -aG kvm "${WRENN_USER}"
|
||||
echo "==> Added '${WRENN_USER}' to 'kvm' group."
|
||||
fi
|
||||
|
||||
# ── 2. Create directories with correct ownership ────────────────────────────
|
||||
|
||||
echo "==> Setting up directories..."
|
||||
|
||||
directories=(
|
||||
"${WRENN_DIR}"
|
||||
"${WRENN_DIR}/images"
|
||||
"${WRENN_DIR}/kernels"
|
||||
"${WRENN_DIR}/sandboxes"
|
||||
"${WRENN_DIR}/snapshots"
|
||||
"${WRENN_DIR}/logs"
|
||||
"/run/netns"
|
||||
)
|
||||
|
||||
for dir in "${directories[@]}"; do
|
||||
mkdir -p "${dir}"
|
||||
done
|
||||
|
||||
# Only chown wrenn-owned dirs (not /run/netns which is system-managed).
|
||||
for dir in "${WRENN_DIR}" "${WRENN_DIR}/images" "${WRENN_DIR}/kernels" \
|
||||
"${WRENN_DIR}/sandboxes" "${WRENN_DIR}/snapshots" "${WRENN_DIR}/logs"; do
|
||||
chown "${WRENN_USER}:${WRENN_GROUP}" "${dir}"
|
||||
chmod 750 "${dir}"
|
||||
done
|
||||
|
||||
# ── 3. Set capabilities on binaries ─────────────────────────────────────────
|
||||
#
|
||||
# These capabilities replace full root access. The wrenn-agent binary gets
|
||||
# exactly the capabilities it needs for:
|
||||
#
|
||||
# CAP_SYS_ADMIN — network namespaces (netns create/enter), mount namespaces
|
||||
# (unshare -m), losetup, dmsetup, mount/umount
|
||||
# CAP_NET_ADMIN — veth/TAP creation (netlink), iptables rules, IP forwarding,
|
||||
# routing table manipulation
|
||||
# CAP_NET_RAW — raw socket access (needed by iptables internally)
|
||||
# CAP_SYS_PTRACE — reading /proc/self/ns/net (netns.Get)
|
||||
# CAP_KILL — sending SIGTERM/SIGKILL to Firecracker processes
|
||||
# CAP_DAC_OVERRIDE — accessing /dev/loop*, /dev/mapper/*, /dev/net/tun,
|
||||
# /proc/sys/net/ipv4/ip_forward
|
||||
# CAP_MKNOD — creating device nodes (dm-snapshot)
|
||||
#
|
||||
# The 'ep' suffix means Effective + Permitted (granted at exec time).
|
||||
|
||||
echo "==> Setting capabilities on wrenn-agent..."
|
||||
|
||||
if [[ ! -f "${AGENT_BIN}" ]]; then
|
||||
echo "WARNING: ${AGENT_BIN} not found, skipping setcap. Install the binary first."
|
||||
else
|
||||
setcap \
|
||||
cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
|
||||
"${AGENT_BIN}"
|
||||
|
||||
echo " Capabilities set on ${AGENT_BIN}:"
|
||||
getcap "${AGENT_BIN}"
|
||||
fi
|
||||
|
||||
# Firecracker also needs capabilities when spawned by a non-root parent.
|
||||
# CAP_NET_ADMIN is required for network device access inside the netns.
|
||||
if [[ -f "${FC_BIN}" ]]; then
|
||||
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep "${FC_BIN}"
|
||||
echo " Capabilities set on ${FC_BIN}:"
|
||||
getcap "${FC_BIN}"
|
||||
fi
|
||||
|
||||
# ── Helper: resolve binary path and apply setcap ────────────────────────────
|
||||
#
|
||||
# Uses `command -v` to find the binary in PATH (handles /usr/bin vs /usr/sbin
|
||||
# differences across distros), then `readlink -f` to resolve symlinks so that
|
||||
# setcap hits the real inode (important for iptables-nft/alternatives).
|
||||
|
||||
setcap_binary() {
|
||||
local name="$1" caps="$2"
|
||||
local bin
|
||||
bin=$(command -v "$name" 2>/dev/null) || {
|
||||
echo " WARNING: ${name} not found in PATH, skipping."
|
||||
return 0
|
||||
}
|
||||
bin=$(readlink -f "$bin")
|
||||
setcap "$caps" "$bin"
|
||||
echo " $(getcap "$bin")"
|
||||
}
|
||||
|
||||
# The child binaries invoked by wrenn-agent (iptables, losetup, dmsetup, etc.)
|
||||
# also need capabilities since they'll be exec'd by a non-root user.
|
||||
echo "==> Setting capabilities on child binaries..."
|
||||
|
||||
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
|
||||
setcap_binary sysctl "cap_net_admin+ep"
|
||||
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
|
||||
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dd "cap_dac_override+ep"
|
||||
setcap_binary unshare "cap_sys_admin+ep"
|
||||
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
|
||||
|
||||
# ── 4. Persist capabilities across package updates ──────────────────────────
|
||||
#
|
||||
# apt/dpkg overwrites binaries on package updates, which strips the xattr-based
|
||||
# capabilities set by setcap. This installs:
|
||||
# - /etc/wrenn/restore-caps.sh: re-applies setcap to all child binaries
|
||||
# - /etc/apt/apt.conf.d/99-wrenn-setcap: apt post-invoke hook that calls it
|
||||
|
||||
echo "==> Installing capability restore hook..."
|
||||
|
||||
mkdir -p /etc/wrenn
|
||||
|
||||
cat > "${RESTORE_CAPS_SCRIPT}" << 'RESTORE'
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# restore-caps.sh — Re-apply Linux capabilities to wrenn child binaries.
|
||||
# Called automatically by apt after package updates (see /etc/apt/apt.conf.d/99-wrenn-setcap).
|
||||
# Can also be run manually: sudo /etc/wrenn/restore-caps.sh
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
setcap_binary() {
|
||||
local name="$1" caps="$2"
|
||||
local bin
|
||||
bin=$(command -v "$name" 2>/dev/null) || return 0
|
||||
bin=$(readlink -f "$bin")
|
||||
setcap "$caps" "$bin" 2>/dev/null || true
|
||||
}
|
||||
|
||||
# wrenn-agent and firecracker (only if present — they aren't package-managed).
|
||||
[[ -f /usr/local/bin/wrenn-agent ]] && \
|
||||
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
|
||||
/usr/local/bin/wrenn-agent 2>/dev/null || true
|
||||
[[ -f /usr/local/bin/firecracker ]] && \
|
||||
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep \
|
||||
/usr/local/bin/firecracker 2>/dev/null || true
|
||||
|
||||
# Child binaries (these are the ones wiped by apt).
|
||||
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
|
||||
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
|
||||
setcap_binary sysctl "cap_net_admin+ep"
|
||||
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
|
||||
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
|
||||
setcap_binary dd "cap_dac_override+ep"
|
||||
setcap_binary unshare "cap_sys_admin+ep"
|
||||
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
|
||||
RESTORE
|
||||
|
||||
chmod 755 "${RESTORE_CAPS_SCRIPT}"
|
||||
|
||||
cat > /etc/apt/apt.conf.d/99-wrenn-setcap << 'APT'
|
||||
// Re-apply Linux capabilities to wrenn child binaries after any package update.
|
||||
// Capabilities (xattr) are stripped when dpkg overwrites a binary.
|
||||
DPkg::Post-Invoke { "/etc/wrenn/restore-caps.sh"; };
|
||||
APT
|
||||
|
||||
echo " Installed ${RESTORE_CAPS_SCRIPT} and apt post-invoke hook."
|
||||
|
||||
# ── 5. Device access ────────────────────────────────────────────────────────
|
||||
#
|
||||
# /dev/kvm — handled by kvm group membership above
|
||||
# /dev/net/tun — needs to be accessible by wrenn user
|
||||
|
||||
echo "==> Configuring device access..."
|
||||
|
||||
# Ensure /dev/net/tun is accessible (udev rule for persistence across reboots).
|
||||
cat > /etc/udev/rules.d/99-wrenn.rules << 'UDEV'
|
||||
# Allow wrenn user access to TUN device for TAP networking.
|
||||
SUBSYSTEM=="misc", KERNEL=="tun", GROUP="wrenn", MODE="0660"
|
||||
UDEV
|
||||
|
||||
udevadm control --reload-rules 2>/dev/null || true
|
||||
echo " Installed udev rule for /dev/net/tun."
|
||||
|
||||
# ── 6. Kernel modules ───────────────────────────────────────────────────────
|
||||
|
||||
echo "==> Ensuring kernel modules are loaded..."
|
||||
|
||||
modules=(dm_snapshot dm_mod loop tun)
|
||||
for mod in "${modules[@]}"; do
|
||||
if ! lsmod | grep -q "^${mod}"; then
|
||||
modprobe "${mod}" 2>/dev/null && echo " Loaded ${mod}" || echo " WARNING: Could not load ${mod}"
|
||||
else
|
||||
echo " ${mod} already loaded."
|
||||
fi
|
||||
done
|
||||
|
||||
# Persist across reboots.
|
||||
for mod in "${modules[@]}"; do
|
||||
grep -qxF "${mod}" /etc/modules-load.d/wrenn.conf 2>/dev/null || echo "${mod}" >> /etc/modules-load.d/wrenn.conf
|
||||
done
|
||||
echo " Module persistence written to /etc/modules-load.d/wrenn.conf."
|
||||
|
||||
# ── 7. Sudoers ──────────────────────────────────────────────────────────────
|
||||
#
|
||||
# The wrenn user has no sudo grants. The absence of a grant is the cage — an
|
||||
# explicit "!ALL" deny is weaker due to known bypasses (CVE-2019-14287).
|
||||
# This file exists purely as documentation for operators running `sudo -l`.
|
||||
|
||||
echo "==> Writing sudoers drop-in..."
|
||||
|
||||
cat > /etc/sudoers.d/wrenn << 'SUDOERS'
|
||||
# Wrenn system user — no sudo access permitted.
|
||||
# All privilege is granted via Linux capabilities on specific binaries (setcap).
|
||||
# This file contains no active rules. The absence of any grant is intentional
|
||||
# and is the strongest way to deny escalation.
|
||||
#
|
||||
# Do not add rules here. If the wrenn user needs new privileges, use setcap
|
||||
# on the specific binary instead.
|
||||
SUDOERS
|
||||
|
||||
chmod 440 /etc/sudoers.d/wrenn
|
||||
visudo -c -f /etc/sudoers.d/wrenn
|
||||
echo " /etc/sudoers.d/wrenn installed and validated."
|
||||
|
||||
# ── 8. Systemd units ────────────────────────────────────────────────────────
|
||||
|
||||
echo "==> Writing systemd service files..."
|
||||
|
||||
cat > /etc/systemd/system/wrenn-agent.service << 'UNIT'
|
||||
[Unit]
|
||||
Description=Wrenn Host Agent
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=wrenn
|
||||
Group=wrenn
|
||||
EnvironmentFile=-/etc/wrenn/agent.env
|
||||
|
||||
# The binary has capabilities set via setcap. These systemd directives ensure
|
||||
# the capabilities are inherited into the process at exec time.
|
||||
AmbientCapabilities=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
|
||||
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
|
||||
|
||||
# IMPORTANT: must be false — child binaries (iptables, losetup, dmsetup, etc.)
|
||||
# have their own file capabilities via setcap which must be honored at exec time.
|
||||
NoNewPrivileges=false
|
||||
|
||||
# Enable IP forwarding before the agent starts. The "+" prefix runs this
|
||||
# directive as root (bypassing User=wrenn) so it can write to procfs.
|
||||
ExecStartPre=+/bin/sh -c 'sysctl -w net.ipv4.ip_forward=1'
|
||||
|
||||
ExecStart=/usr/local/bin/wrenn-agent --address ${WRENN_ADVERTISE_ADDR}
|
||||
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
# File descriptor limits (Firecracker + loop devices + sockets).
|
||||
LimitNOFILE=65536
|
||||
LimitNPROC=4096
|
||||
|
||||
# Protect host filesystem — only allow access to what's needed.
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/lib/wrenn /tmp /run/netns /dev/mapper
|
||||
ReadOnlyPaths=/usr/local/bin/firecracker
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
UNIT
|
||||
|
||||
cat > /etc/systemd/system/wrenn-cp.service << 'UNIT'
|
||||
[Unit]
|
||||
Description=Wrenn Control Plane
|
||||
After=network-online.target postgresql.service
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=wrenn
|
||||
Group=wrenn
|
||||
EnvironmentFile=-/etc/wrenn/cp.env
|
||||
|
||||
# Control plane is fully unprivileged — no capabilities needed.
|
||||
NoNewPrivileges=true
|
||||
CapabilityBoundingSet=
|
||||
|
||||
ExecStart=/usr/local/bin/wrenn-cp
|
||||
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
ProtectHome=true
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/tmp
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
UNIT
|
||||
|
||||
mkdir -p /etc/wrenn
|
||||
touch /etc/wrenn/agent.env /etc/wrenn/cp.env
|
||||
chmod 640 /etc/wrenn/agent.env /etc/wrenn/cp.env
|
||||
chown root:${WRENN_GROUP} /etc/wrenn/agent.env /etc/wrenn/cp.env
|
||||
|
||||
systemctl daemon-reload
|
||||
echo " wrenn-agent.service and wrenn-cp.service installed."
|
||||
|
||||
# ── Done ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
echo ""
|
||||
echo "=== Setup complete ==="
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo " 1. Copy wrenn-agent and wrenn-cp binaries to /usr/local/bin/"
|
||||
echo " 2. Edit /etc/wrenn/agent.env with WRENN_CP_URL and WRENN_ADVERTISE_ADDR"
|
||||
echo " 3. Edit /etc/wrenn/cp.env with DATABASE_URL and other control plane config"
|
||||
echo " 4. systemctl enable --now wrenn-agent"
|
||||
echo " 5. systemctl enable --now wrenn-cp"
|
||||
echo ""
|
||||
echo "Security summary:"
|
||||
echo " - wrenn user: bash shell (for debugging), no home, no sudo (no grants in sudoers)"
|
||||
echo " - wrenn-agent: runs as wrenn with 7 capabilities via setcap (not root)"
|
||||
echo " - wrenn-cp: runs as wrenn with zero capabilities"
|
||||
echo " - Capabilities auto-restored after apt upgrades via /etc/wrenn/restore-caps.sh"
|
||||
echo ""
|
||||
Reference in New Issue
Block a user