3 Commits

Author SHA1 Message Date
6014fa72cf Merge pull request 'Added a multi-distro system and asynchronized snapshot action' (#52) from fix/image-creation-and-maintenance into dev
Reviewed-on: #52
2026-05-22 20:07:18 +00:00
c164aadfc9 feat(templates): multi-distro system base images + paused-state snapshotting
- Replace single 'minimal' system template with four well-known distro
  base images: minimal-ubuntu (id 0), minimal-alpine (id 1), minimal-arch
  (id 2), minimal-fedora (id 3) — all platform-owned, reserved IDs 0–1024
- Add id.IsReservedTemplateID() guard and well-known ID constants
- Seed the four system base templates via a goose migration
- On-disk layout: images/teams/{b36(team)}/{b36(tid)}/rootfs.ext4
- Rebuild scripts (update-minimal-rootfs.sh, build-*.sh) handle all four
- rootfs-from-container.sh: fix usr-merged distro tini install, flatten
  IMAGE_NAME slashes for tar path

Snapshotting:
- Allow snapshots from 'paused' state (not just 'running')
- Implement flattenPausedCow for paused sandboxes: re-attach dm-snapshot,
  flatten CoW, tear down; distinct dm name to avoid colliding with Resume
- copyMemorySnapshotFiles + linkOrCopyFile: hardlink CH memory artefacts
  with sparse-preserving cp --sparse=always fallback across filesystems
- Promote staging dir atomically with rename(2)
- Track origStatus through snapshotInBackground so badge returns to
  paused (not running) after paused snapshot
- Expand CleanupOrphanPauseDirs to clean up .stage- prefixes too

Build service:
- Look up all base templates (including system ones) via DB query instead
  of hardcoding the minimal template path
- Insert a sandbox DB record for builder sandboxes
- Destroy builder sandbox + mark DB record stopped on finalize
- Default base template to minimal-ubuntu instead of minimal

Chores:
- Remove stale recipe files (code-runner-beta, jupyter test)
- Remove prepare-wrenn-user.sh (replaced by setup-host.sh)
- Remove old build-rootfs.sh / docker-to-rootfs.sh / per-template build.sh
- Update CLAUDE.md, README.md, envd-rs README with new template info
- Frontend: update admin templates page, admin capsule view, snapshot
  dialog to support paused-state snapshots
2026-05-23 01:58:51 +06:00
a01796a4c3 feat(sandbox): async snapshots with snapshotting state + lifecycle toasts
Snapshots now mirror the async pause flow. POST /v1/snapshots returns 202
with the capsule in a new "snapshotting" state and registers the template
in a background goroutine, instead of blocking the request while Cloud
Hypervisor pauses the VM. Users get clear feedback that the capsule is
busy rather than seeing it appear wedged.

- New "snapshotting" sandbox state: CAS running -> snapshotting -> running
- CreateSnapshot async via snapshotInBackground; the back-to-running CAS
  is gated so a racing destroy wins (no false "running" signal)
- serviceEventToCanonical maps snapshot completion + state-change events;
  state changes are published transiently (SSE only)
- HostMonitor reconciles stuck snapshotting: 15m grace while the VM is
  alive, error if the VM is gone
- Global lifecycle toasts (create/pause/resume/snapshot/destroy) driven by
  terminal system-actor SSE events, exactly one per operation
- openapi: status enum, 202 + Capsule responses, admin snapshot name optional
- Remove dead LogSnapshotCreate; host-callback failure during a snapshot
  marks the sandbox errored instead of removing the row
2026-05-22 21:36:46 +06:00
53 changed files with 1148 additions and 839 deletions

View File

@ -97,7 +97,7 @@ Startup (`cmd/control-plane/main.go`) is a thin wrapper: `cpserver.Run(cpserver.
**Packages:** `internal/hostagent/`, `internal/sandbox/`, `internal/vm/`, `internal/network/`, `internal/devicemapper/`, `internal/envdclient/`, `internal/snapshot/`
**Production deployment:** `scripts/prepare-wrenn-user.sh` creates the `wrenn` system user, sets Linux capabilities (setcap) on wrenn-agent and all child binaries (iptables, losetup, dmsetup, etc.), installs an apt hook to restore capabilities after package updates, configures udev rules for `/dev/net/tun`, loads required kernel modules, and writes systemd unit files for both services. No sudo grants — all privilege is via capabilities.
**Production deployment:** `make setup-host` (→ `scripts/setup-host.sh`) prepares the host: creates the `wrenn` system user, sets Linux capabilities (setcap) on wrenn-agent and all child binaries (iptables, losetup, dmsetup, etc.), installs an apt hook to restore capabilities after package updates, configures udev rules for `/dev/net/tun`, and loads required kernel modules. No sudo grants — all privilege is via capabilities. `make install` then copies the binaries to `/usr/local/bin` and installs the systemd units from `deploy/systemd/`.
Startup (`cmd/host-agent/main.go`) wires: root/capabilities check → enable IP forwarding → clean up stale dm devices → `sandbox.Manager` (containing `vm.Manager` + `network.SlotAllocator` + `devicemapper.LoopRegistry`) → `hostagent.Server` (Connect RPC handler) → HTTP server.
@ -258,13 +258,14 @@ To add a new query: add it to the appropriate `.sql` file in `db/queries/` → `
## Rootfs & Guest Init
- **wrenn-init** (`images/wrenn-init.sh`): the PID 1 init script baked into every rootfs. Mounts virtual filesystems, sets hostname, writes `/etc/resolv.conf`, then execs envd.
- **Updating the rootfs** after changing envd or wrenn-init: `bash scripts/update-minimal-rootfs.sh`. This builds envd via `make build-envd` (Rust → static musl binary), mounts the rootfs image, copies in the new binaries, and unmounts. Defaults to `/var/lib/wrenn/images/minimal.ext4`.
- Rootfs images are minimal debootstrap — no systemd, no coreutils beyond busybox. Use `/bin/sh -c` for shell builtins inside the guest.
- **System base templates**: four built-in distro images — `minimal-ubuntu` (id 0, default), `minimal-alpine` (1), `minimal-arch` (2), `minimal-fedora` (3) — built via `images/build-{ubuntu,alpine,arch,fedora}.sh` (or `make images`). All platform-owned, protected from deletion (reserved IDs 01024). Same static envd + tini run on all four. Each has a `wrenn-user` with passwordless sudo.
- **Updating the rootfs** after changing envd or wrenn-init: `bash scripts/update-minimal-rootfs.sh`. Builds envd via `make build-envd` (Rust → static musl binary), then re-injects envd + wrenn-init + tini into all four system base images.
- Rootfs images are built from distro containers — no systemd (init is overridden to `wrenn-init`). Use `/bin/sh -c` for shell builtins inside the guest.
## Fixed Paths (on host machine)
- Kernel: `/var/lib/wrenn/kernels/vmlinux`
- Base rootfs images: `/var/lib/wrenn/images/{template}.ext4`
- Base rootfs images: `/var/lib/wrenn/images/teams/{base36(teamID)}/{base36(templateID)}/rootfs.ext4` (system templates use the platform team, base36 all-zeros)
- Sandbox clones: `/var/lib/wrenn/sandboxes/`
- Cloud Hypervisor: `/usr/local/bin/cloud-hypervisor`

View File

@ -131,32 +131,24 @@ check: fmt vet lint test
# ═══════════════════════════════════════════════════
# Rootfs Images
# ═══════════════════════════════════════════════════
.PHONY: images image-minimal image-python image-node
.PHONY: images rootfs-ubuntu rootfs-alpine rootfs-arch rootfs-fedora
images: build-envd image-minimal image-python image-node
# Build all four system base rootfs images (ubuntu/alpine/arch/fedora). Each
# spawns a distro container, installs the required packages + wrenn-user, then
# exports to images/teams/<platform>/<id>/rootfs.ext4. Requires docker + sudo.
images: rootfs-ubuntu rootfs-alpine rootfs-arch rootfs-fedora
image-minimal:
sudo bash images/templates/minimal/build.sh
rootfs-ubuntu:
bash images/build-ubuntu.sh
image-python:
sudo bash images/templates/python312/build.sh
rootfs-alpine:
bash images/build-alpine.sh
image-node:
sudo bash images/templates/node20/build.sh
rootfs-arch:
bash images/build-arch.sh
# ═══════════════════════════════════════════════════
# Deployment
# ═══════════════════════════════════════════════════
.PHONY: setup-host install
setup-host:
sudo bash scripts/setup-host.sh
install: build
sudo cp $(BIN_DIR)/wrenn-cp /usr/local/bin/
sudo cp $(BIN_DIR)/wrenn-agent /usr/local/bin/
sudo cp deploy/systemd/*.service /etc/systemd/system/
sudo systemctl daemon-reload
rootfs-fedora:
bash images/build-fedora.sh
# ═══════════════════════════════════════════════════
# Clean

View File

@ -22,7 +22,7 @@ Produces three binaries: `wrenn-cp` (control plane), `wrenn-agent` (host agent),
## Host setup
The host agent needs a kernel, a minimal rootfs image, and working directories on the host machine.
The host agent needs a kernel, the system base rootfs images, and working directories on the host machine.
### Directory structure
@ -31,59 +31,74 @@ The host agent needs a kernel, a minimal rootfs image, and working directories o
├── kernels/
│ └── vmlinux # uncompressed Linux kernel (not bzImage)
├── images/
│ └── minimal/
│ └── rootfs.ext4 # base rootfs (all other templates snapshot from this)
│ └── teams/
│ └── 0000000000000000000000000/ # platform team (base36 all-zeros)
│ ├── 0000000000000000000000000/rootfs.ext4 # minimal-ubuntu (id 0)
│ ├── 0000000000000000000000001/rootfs.ext4 # minimal-alpine (id 1)
│ ├── 0000000000000000000000002/rootfs.ext4 # minimal-arch (id 2)
│ └── 0000000000000000000000003/rootfs.ext4 # minimal-fedora (id 3)
├── sandboxes/ # per-sandbox CoW files (created at runtime)
└── snapshots/ # pause/hibernate snapshot files (created at runtime)
```
Create the directories:
Create the base directories (the per-template image dirs are created by the build scripts):
```bash
sudo mkdir -p /var/lib/wrenn/{kernels,images/minimal,sandboxes,snapshots}
sudo mkdir -p /var/lib/wrenn/{kernels,images,sandboxes,snapshots}
```
### Kernel
Place an uncompressed `vmlinux` kernel at `/var/lib/wrenn/kernels/vmlinux`. Versioned kernels (`vmlinux-{semver}`) are also supported — the agent picks the latest by semver.
### Minimal rootfs
### System base rootfs images
The minimal rootfs is the base image that all other templates (Python, Node, etc.) are built on top of via device-mapper snapshots. It must contain:
There are four built-in **system base templates** — one per distro — that all other
templates snapshot from via device-mapper. They are platform-owned (visible to every
team) and protected from deletion (reserved template IDs 01024):
| Template | Distro | ID |
|----------|--------|----|
| `minimal-ubuntu` | `ubuntu:26.04` | 0 |
| `minimal-alpine` | `alpine:3.22` | 1 |
| `minimal-arch` | `archlinux:base` | 2 |
| `minimal-fedora` | `fedora:45` | 3 |
`minimal-ubuntu` is the default template for new sandboxes and builds. The same
statically-linked `envd` + `tini` run on all four regardless of the distro's libc
(glibc on Ubuntu/Arch/Fedora, musl on Alpine).
Each image contains these packages plus a `wrenn-user` account with passwordless `sudo`:
| Package | Why |
|---------|-----|
| `socat` | Bidirectional relay for port forwarding |
| `chrony` | Time sync from KVM PTP clock (`/dev/ptp0`) |
| `tini` | PID 1 zombie reaper (injected by build script, not apt) |
| `iproute2` (`iproute` on Fedora) | `ip` for guest network setup in `wrenn-init` |
| `tini` | PID 1 zombie reaper |
| `sudo` | User privilege management inside the guest |
| `wget` | HTTP fetching |
| `curl` | HTTP client |
| `ca-certificates` | TLS certificate verification |
| `git` | Version control |
**To build a rootfs from a Docker container:**
**To build all four images** (each spawns a distro container, installs the packages +
`wrenn-user`, builds `envd`, injects `wrenn-init` + `tini`, and exports to the
team-scoped path). Requires Docker + sudo:
1. Create and configure a container with the required packages:
```bash
docker run -it --name wrenn-minimal debian:bookworm bash
# Inside the container:
apt update && apt install -y socat chrony sudo wget curl ca-certificates
exit
```
```bash
make images
```
2. Export to a rootfs image (builds envd, injects wrenn-init + tini, shrinks to minimum size):
```bash
sudo bash scripts/rootfs-from-container.sh wrenn-minimal minimal
```
Or build a single distro: `make rootfs-ubuntu` / `rootfs-alpine` / `rootfs-arch` / `rootfs-fedora`.
**To update an existing rootfs** after changing envd or `wrenn-init.sh`:
**To update the images** after changing `envd` or `wrenn-init.sh` (rebuilds `envd` once,
then re-injects `envd` + `wrenn-init` + `tini` into every system base image):
```bash
bash scripts/update-minimal-rootfs.sh
```
This rebuilds envd via `make build-envd` and copies the fresh binaries into the mounted rootfs image.
### IP forwarding
```bash

View File

@ -228,7 +228,7 @@ func main() {
// snapshotted state. User-initiated Pauses already running are
// awaited by PauseAll/Destroy's lifecycleMu serialization.
mgr.Shutdown(shutdownCtx)
sandbox.ShrinkMinimalImage(rootDir)
sandbox.ShrinkSystemImages(rootDir)
if err := httpServer.Shutdown(shutdownCtx); err != nil {
slog.Error("http server shutdown error", "error", err)
}

View File

@ -0,0 +1,49 @@
-- +goose Up
-- Replace the old all-zeros "minimal" base template with the four system base
-- templates (ubuntu/alpine/arch/fedora). All are platform-owned (team_id
-- all-zeros) with reserved template IDs 0..3, default user wrenn-user.
--
-- Template IDs are well-known: the all-zeros UUID + low byte = {0,1,2,3}.
-- On disk each lives at images/teams/{base36(0)}/{base36(id)}/rootfs.ext4.
-- 0 → minimal-ubuntu (was "minimal").
UPDATE templates
SET name = 'minimal-ubuntu',
default_user = 'wrenn-user'
WHERE id = '00000000-0000-0000-0000-000000000000';
-- Seed the row if it did not already exist (fresh DBs).
INSERT INTO templates (id, name, type, vcpus, memory_mb, size_bytes, team_id, default_user)
VALUES ('00000000-0000-0000-0000-000000000000', 'minimal-ubuntu', 'base', 1, 512, 0,
'00000000-0000-0000-0000-000000000000', 'wrenn-user')
ON CONFLICT (id) DO NOTHING;
-- 1 → minimal-alpine, 2 → minimal-arch, 3 → minimal-fedora.
INSERT INTO templates (id, name, type, vcpus, memory_mb, size_bytes, team_id, default_user)
VALUES
('00000000-0000-0000-0000-000000000001', 'minimal-alpine', 'base', 1, 512, 0,
'00000000-0000-0000-0000-000000000000', 'wrenn-user'),
('00000000-0000-0000-0000-000000000002', 'minimal-arch', 'base', 1, 512, 0,
'00000000-0000-0000-0000-000000000000', 'wrenn-user'),
('00000000-0000-0000-0000-000000000003', 'minimal-fedora', 'base', 1, 512, 0,
'00000000-0000-0000-0000-000000000000', 'wrenn-user')
ON CONFLICT (id) DO NOTHING;
-- Point the sandboxes.template column default at the new default base template.
ALTER TABLE sandboxes ALTER COLUMN template SET DEFAULT 'minimal-ubuntu';
-- +goose Down
ALTER TABLE sandboxes ALTER COLUMN template SET DEFAULT 'minimal';
DELETE FROM templates WHERE id IN (
'00000000-0000-0000-0000-000000000001',
'00000000-0000-0000-0000-000000000002',
'00000000-0000-0000-0000-000000000003'
);
UPDATE templates
SET name = 'minimal',
default_user = 'root'
WHERE id = '00000000-0000-0000-0000-000000000000';

View File

@ -2,7 +2,7 @@
name = "envd"
version = "0.3.0"
edition = "2024"
rust-version = "1.88"
rust-version = "1.95"
[dependencies]
# Async runtime

View File

@ -128,13 +128,15 @@ src/
After building the static binary, copy it into the rootfs:
```bash
bash scripts/update-debug-rootfs.sh [rootfs_path]
bash scripts/update-minimal-rootfs.sh [rootfs_path]
```
Or manually:
With no argument it updates all four system base images; pass a path to target one.
Or manually (example path: the minimal-ubuntu image, platform team + template id 0):
```bash
sudo mount -o loop /var/lib/wrenn/images/minimal.ext4 /mnt
sudo cp target/x86_64-unknown-linux-musl/release/envd /mnt/usr/bin/envd
sudo mount -o loop /var/lib/wrenn/images/teams/0000000000000000000000000/0000000000000000000000000/rootfs.ext4 /mnt
sudo cp target/x86_64-unknown-linux-musl/release/envd /mnt/usr/local/bin/envd
sudo umount /mnt
```

View File

@ -18,7 +18,9 @@ export async function destroyAdminCapsule(id: string): Promise<ApiResult<void>>
return apiFetch('DELETE', `/api/v1/admin/capsules/${id}`);
}
export async function snapshotAdminCapsule(id: string, name?: string): Promise<ApiResult<Snapshot>> {
// Async: returns 202 with the capsule now in the "snapshotting" state. The
// template lands later (watch template.snapshot.create or poll templates).
export async function snapshotAdminCapsule(id: string, name?: string): Promise<ApiResult<Capsule>> {
return apiFetch('POST', `/api/v1/admin/capsules/${id}/snapshot`, { name });
}
@ -35,6 +37,7 @@ export async function listPlatformTemplates(): Promise<ApiResult<Snapshot[]>> {
size_bytes: t.size_bytes,
created_at: t.created_at,
platform: true,
protected: t.protected,
}));
return { ok: true, data: snapshots };
}

View File

@ -97,6 +97,8 @@ export type AdminTemplate = {
size_bytes: number;
team_id: string;
created_at: string;
/** True for built-in system base templates, which cannot be deleted. */
protected: boolean;
};
export async function listAdminTemplates(): Promise<ApiResult<AdminTemplate[]>> {

View File

@ -8,6 +8,7 @@ export type CapsuleStatus =
| 'running'
| 'pausing'
| 'paused'
| 'snapshotting'
| 'resuming'
| 'stopping'
| 'hibernated'
@ -26,6 +27,7 @@ export const TRANSIENT_STATUSES: ReadonlySet<CapsuleStatus> = new Set([
'pending',
'starting',
'pausing',
'snapshotting',
'resuming',
'stopping'
]);
@ -88,9 +90,14 @@ export type Snapshot = {
size_bytes: number;
created_at: string;
platform: boolean;
/** True for built-in system base templates, which cannot be deleted. */
protected?: boolean;
};
export async function createSnapshot(capsuleId: string, name?: string): Promise<ApiResult<Snapshot>> {
// Snapshots are async: the call returns 202 with the capsule now in the
// "snapshotting" state. The resulting template arrives later via the
// template.snapshot.create SSE event (or by polling listSnapshots).
export async function createSnapshot(capsuleId: string, name?: string): Promise<ApiResult<Capsule>> {
return apiFetch('POST', '/api/v1/snapshots', { sandbox_id: capsuleId, name });
}

View File

@ -11,7 +11,7 @@
};
let { open, onclose, oncreated, templateSource = 'team' }: Props = $props();
let createForm = $state<CreateCapsuleParams>({ template: 'minimal', vcpus: 1, memory_mb: 512, timeout_sec: 0 });
let createForm = $state<CreateCapsuleParams>({ template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, timeout_sec: 0 });
let creating = $state(false);
let createError = $state<string | null>(null);
@ -120,8 +120,8 @@
const creator = templateSource === 'platform' ? createAdminCapsule : createCapsule;
const result = await creator(createForm);
if (result.ok) {
createForm = { template: 'minimal', vcpus: 1, memory_mb: 512, timeout_sec: 0 };
templateQuery = 'minimal';
createForm = { template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, timeout_sec: 0 };
templateQuery = 'minimal-ubuntu';
onclose();
oncreated?.(result.data);
} else {

View File

@ -1,13 +1,38 @@
<script lang="ts">
import { createSnapshot } from '$lib/api/capsules';
import type { Snippet } from 'svelte';
import { createSnapshot, type Capsule } from '$lib/api/capsules';
import type { ApiResult } from '$lib/api/client';
type SnapshotFn = (capsuleId: string, name?: string) => Promise<ApiResult<Capsule>>;
type Props = {
open: boolean;
capsuleId: string;
onclose: () => void;
onsnapshot?: () => void;
onsnapshot?: (capsule: Capsule) => void;
title?: string;
label?: string;
placeholder?: string;
hint?: string;
confirmLabel?: string;
pendingLabel?: string;
snapshotFn?: SnapshotFn;
description?: Snippet;
};
let { open, capsuleId, onclose, onsnapshot }: Props = $props();
let {
open,
capsuleId,
onclose,
onsnapshot,
title = 'Capture snapshot',
label = 'Snapshot name',
placeholder = 'e.g. after-apt-install, pre-migration',
hint = 'Leave blank to use an auto-generated name.',
confirmLabel = 'Start snapshot',
pendingLabel = 'Starting...',
snapshotFn = createSnapshot,
description
}: Props = $props();
let snapshotName = $state('');
let snapshotting = $state(false);
@ -21,14 +46,14 @@
async function handleConfirm() {
snapshotting = true;
error = null;
const result = await createSnapshot(capsuleId, snapshotName.trim() || undefined);
const result = await snapshotFn(capsuleId, snapshotName.trim() || undefined);
if (!result.ok) {
error = result.error;
snapshotting = false;
return;
}
reset();
onsnapshot?.();
onsnapshot?.(result.data);
onclose();
snapshotting = false;
}
@ -41,6 +66,10 @@
}
</script>
{#snippet defaultDescription()}
<p class="text-ui text-[var(--color-text-tertiary)]">The capsule moves to a <span class="font-mono text-[var(--color-blue)]">snapshotting</span> state while its memory and disk are written to a new template, then returns to running. This runs in the background; you'll be notified when it completes.</p>
{/snippet}
{#if open}
<div class="fixed inset-0 z-50 flex items-center justify-center">
<!-- svelte-ignore a11y_no_static_element_interactions -->
@ -59,13 +88,13 @@
</svg>
</div>
<div>
<h2 class="font-serif text-heading text-[var(--color-text-bright)]">Capture snapshot</h2>
<h2 class="font-serif text-heading text-[var(--color-text-bright)]">{title}</h2>
<p class="mt-0.5 text-meta text-[var(--color-text-muted)] font-mono">{capsuleId}</p>
</div>
</div>
<div class="px-6 pt-5 pb-6 space-y-4">
<p class="text-ui text-[var(--color-text-tertiary)]">Live snapshot: the capsule briefly pauses, its memory + disk are written to a new template, then the capsule resumes — your session keeps running.</p>
{@render (description ?? defaultDescription)()}
{#if error}
<div class="rounded-[var(--radius-input)] border border-[var(--color-red)]/30 bg-[var(--color-red)]/5 px-3 py-2 text-meta text-[var(--color-red)]">
@ -75,7 +104,7 @@
<div>
<div class="mb-1.5 flex items-baseline justify-between">
<label class="text-label font-semibold uppercase tracking-[0.05em] text-[var(--color-text-tertiary)]" for="snapshot-name">Snapshot name</label>
<label class="text-label font-semibold uppercase tracking-[0.05em] text-[var(--color-text-tertiary)]" for="snapshot-name">{label}</label>
<span class="text-meta text-[var(--color-text-muted)]">optional</span>
</div>
<input
@ -84,10 +113,10 @@
bind:value={snapshotName}
disabled={snapshotting}
class="w-full rounded-[var(--radius-input)] border border-[var(--color-border)] bg-[var(--color-bg-4)] px-3 py-2 font-mono text-ui text-[var(--color-text-bright)] outline-none placeholder:text-[var(--color-text-muted)] transition-colors duration-150 focus:border-[var(--color-accent)] disabled:opacity-50"
placeholder="e.g. after-apt-install, pre-migration"
placeholder={placeholder}
onkeydown={(e) => { if (e.key === 'Enter' && !snapshotting) handleConfirm(); }}
/>
<p class="mt-1.5 text-meta text-[var(--color-text-muted)]">Leave blank to use an auto-generated name.</p>
<p class="mt-1.5 text-meta text-[var(--color-text-muted)]">{hint}</p>
</div>
<div class="flex justify-end gap-3 pt-1">
@ -107,9 +136,9 @@
<svg class="animate-spin" width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<path d="M21 12a9 9 0 1 1-6.219-8.56" />
</svg>
Capturing...
{pendingLabel}
{:else}
Capture snapshot
{confirmLabel}
{/if}
</button>
</div>

View File

@ -0,0 +1,39 @@
import type { SSEEvent } from '$lib/api/events';
import { toast } from '$lib/toast.svelte';
// Terminal copy per lifecycle verb. Success and failure are paired so the two
// can never drift apart.
const VERBS: Record<string, { done: string; failed: string }> = {
'capsule.create': { done: 'Capsule created', failed: 'Capsule failed to start' },
'capsule.pause': { done: 'Capsule paused', failed: 'Capsule failed to pause' },
'capsule.resume': { done: 'Capsule resumed', failed: 'Capsule failed to resume' },
'capsule.destroy': { done: 'Capsule destroyed', failed: 'Capsule failed to destroy' }
};
/**
* Surfaces lifecycle outcomes as toasts. Only system-actor events with an
* outcome are terminal: the user-actor events published at request-accept time
* carry a premature outcome (the operation has only been accepted, not yet
* completed) and are skipped, so each operation toasts exactly once.
*/
export function lifecycleToast(event: SSEEvent): void {
if (event.actor?.type !== 'system' || !event.outcome) return;
if (event.event === 'template.snapshot.create') {
const name = event.resource?.id;
if (event.outcome === 'success') {
toast.success(name ? `Snapshot "${name}" captured` : 'Snapshot captured');
} else {
toast.error(event.error ? `Snapshot failed: ${event.error}` : 'Snapshot failed');
}
return;
}
const verb = VERBS[event.event];
if (!verb) return;
if (event.outcome === 'success') {
toast.success(verb.done);
} else {
toast.error(event.error ? `${verb.failed}: ${event.error}` : verb.failed);
}
}

View File

@ -6,6 +6,7 @@
import FilesTab from '$lib/components/FilesTab.svelte';
import MetricsPanel from '$lib/components/MetricsPanel.svelte';
import DestroyDialog from '$lib/components/DestroyDialog.svelte';
import SnapshotDialog from '$lib/components/SnapshotDialog.svelte';
import CopyButton from '$lib/components/CopyButton.svelte';
import { toast } from '$lib/toast.svelte';
import {
@ -29,9 +30,6 @@
// Snapshot dialog
let showSnapshot = $state(false);
let snapshotName = $state('');
let snapshotting = $state(false);
let snapshotError = $state<string | null>(null);
const metricsAvailable = $derived(
capsule?.status === 'running' || capsule?.status === 'paused'
@ -58,28 +56,12 @@
capsuleLoading = false;
}
async function handleSnapshot() {
snapshotting = true;
snapshotError = null;
const result = await snapshotAdminCapsule(capsuleId, snapshotName.trim() || undefined);
if (result.ok) {
toast.success(`Snapshot "${result.data.name}" created`);
showSnapshot = false;
snapshotName = '';
// Capsule keeps running after a live snapshot; refresh local state.
void loadCapsule();
} else {
snapshotError = result.error;
}
snapshotting = false;
}
function statusColor(status: string): string {
switch (status) {
case 'running': return 'var(--color-accent)';
case 'paused': case 'hibernated': return 'var(--color-amber)';
case 'error': return 'var(--color-red)';
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'stopping':
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
return 'var(--color-blue)';
default: return 'var(--color-text-muted)';
}
@ -90,7 +72,7 @@
case 'running': return 'rgba(94,140,88,0.12)';
case 'paused': case 'hibernated': return 'rgba(212,167,60,0.12)';
case 'error': return 'rgba(207,129,114,0.12)';
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'stopping':
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
return 'rgba(90,159,212,0.12)';
default: return 'rgba(255,255,255,0.05)';
}
@ -101,7 +83,7 @@
case 'running': return 'rgba(94,140,88,0.3)';
case 'paused': case 'hibernated': return 'rgba(212,167,60,0.3)';
case 'error': return 'rgba(207,129,114,0.3)';
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'stopping':
case 'pending': case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
return 'rgba(90,159,212,0.3)';
default: return 'rgba(255,255,255,0.08)';
}
@ -211,8 +193,7 @@
<div class="ml-auto flex items-center gap-2">
{#if canSnapshot}
<button
onclick={() => { showSnapshot = true; snapshotName = ''; snapshotError = null; }}
disabled={snapshotting}
onclick={() => { showSnapshot = true; }}
class="flex items-center gap-1.5 rounded-[var(--radius-button)] border border-[var(--color-accent)]/30 bg-[var(--color-accent)]/8 px-3 py-1.5 text-meta font-medium text-[var(--color-accent-bright)] transition-all duration-150 hover:bg-[var(--color-accent)]/15 hover:border-[var(--color-accent)]/50 disabled:opacity-50"
>
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.75" stroke-linecap="round" stroke-linejoin="round"><path d="M14.5 4h-5L7 7H2v13a2 2 0 002 2h16a2 2 0 002-2V7h-5l-2.5-3z" /><circle cx="12" cy="15" r="3" /></svg>
@ -270,83 +251,24 @@
</footer>
</main>
<!-- Snapshot dialog -->
{#if showSnapshot}
<div class="fixed inset-0 z-50 flex items-center justify-center">
<!-- svelte-ignore a11y_no_static_element_interactions -->
<div
class="absolute inset-0 bg-black/60"
onclick={() => { if (!snapshotting) showSnapshot = false; }}
onkeydown={(e) => { if (e.key === 'Escape' && !snapshotting) showSnapshot = false; }}
></div>
{#snippet adminSnapshotDescription()}
<p class="text-ui text-[var(--color-text-tertiary)]">The capsule moves to a <span class="font-mono text-[var(--color-blue)]">snapshotting</span> state while its memory and disk are written to a new platform template available to all teams, then returns to running. This runs in the background.</p>
{/snippet}
<div class="relative w-full max-w-[420px] rounded-[var(--radius-card)] border border-[var(--color-border-mid)] bg-[var(--color-bg-2)] overflow-hidden" style="animation: fadeUp 0.2s ease both; box-shadow: var(--shadow-dialog)">
<div class="flex items-center gap-4 border-b border-[var(--color-border)] bg-[var(--color-bg-3)] px-6 py-5">
<div class="flex h-10 w-10 shrink-0 items-center justify-center rounded-[var(--radius-input)] bg-[var(--color-accent)]/15 text-[var(--color-accent)] shadow-[0_0_12px_var(--color-accent-glow)]">
<svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.75" stroke-linecap="round" stroke-linejoin="round">
<path d="M14.5 4h-5L7 7H2v13a2 2 0 002 2h16a2 2 0 002-2V7h-5l-2.5-3z" />
<circle cx="12" cy="15" r="3" />
</svg>
</div>
<div>
<h2 class="font-serif text-heading text-[var(--color-text-bright)]">Snapshot as platform template</h2>
<p class="mt-0.5 text-meta text-[var(--color-text-muted)] font-mono">{capsuleId}</p>
</div>
</div>
<div class="px-6 pt-5 pb-6 space-y-4">
<p class="text-ui text-[var(--color-text-tertiary)]">Live snapshot: the capsule briefly pauses, its memory + disk are written to a new platform template available to all teams, then the capsule resumes — your session keeps running.</p>
{#if snapshotError}
<div class="rounded-[var(--radius-input)] border border-[var(--color-red)]/30 bg-[var(--color-red)]/5 px-3 py-2 text-meta text-[var(--color-red)]">
{snapshotError}
</div>
{/if}
<div>
<div class="mb-1.5 flex items-baseline justify-between">
<label class="text-label font-semibold uppercase tracking-[0.05em] text-[var(--color-text-tertiary)]" for="admin-snapshot-name">Template name</label>
<span class="text-meta text-[var(--color-text-muted)]">optional</span>
</div>
<input
id="admin-snapshot-name"
type="text"
bind:value={snapshotName}
disabled={snapshotting}
class="w-full rounded-[var(--radius-input)] border border-[var(--color-border)] bg-[var(--color-bg-4)] px-3 py-2 font-mono text-ui text-[var(--color-text-bright)] outline-none placeholder:text-[var(--color-text-muted)] transition-colors duration-150 focus:border-[var(--color-accent)] disabled:opacity-50"
placeholder="e.g. python-3.12, node-22-dev"
onkeydown={(e) => { if (e.key === 'Enter' && !snapshotting) handleSnapshot(); }}
/>
<p class="mt-1.5 text-meta text-[var(--color-text-muted)]">Leave blank for an auto-generated name. If the name already exists, it will be overwritten.</p>
</div>
<div class="flex justify-end gap-3 pt-1">
<button
onclick={() => { showSnapshot = false; }}
disabled={snapshotting}
class="rounded-[var(--radius-button)] border border-[var(--color-border)] px-4 py-2 text-ui text-[var(--color-text-secondary)] transition-colors duration-150 hover:border-[var(--color-border-mid)] hover:text-[var(--color-text-primary)] disabled:opacity-50"
>
Cancel
</button>
<button
onclick={handleSnapshot}
disabled={snapshotting}
class="flex items-center gap-2 rounded-[var(--radius-button)] bg-[var(--color-accent)] px-5 py-2 text-ui font-semibold text-white transition-all duration-150 hover:brightness-115 hover:-translate-y-px active:translate-y-0 disabled:opacity-50 disabled:hover:translate-y-0"
>
{#if snapshotting}
<svg class="animate-spin" width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<path d="M21 12a9 9 0 1 1-6.219-8.56" />
</svg>
Snapshotting...
{:else}
Snapshot
{/if}
</button>
</div>
</div>
</div>
</div>
{/if}
<SnapshotDialog
open={showSnapshot}
{capsuleId}
onclose={() => { showSnapshot = false; }}
onsnapshot={(updated) => { toast.success('Snapshot started'); capsule = updated; }}
snapshotFn={snapshotAdminCapsule}
title="Snapshot as platform template"
label="Template name"
placeholder="e.g. python-3.12, node-22-dev"
hint="Leave blank for an auto-generated name. Each snapshot needs a unique name."
confirmLabel="Snapshot"
pendingLabel="Snapshotting..."
description={adminSnapshotDescription}
/>
<DestroyDialog
open={showDestroy}

View File

@ -44,7 +44,7 @@
let showCreate = $state(false);
let createForm = $state({
name: '',
base_template: 'minimal',
base_template: 'minimal-ubuntu',
vcpus: 1,
memory_mb: 512,
recipe: '',
@ -80,7 +80,7 @@
const PLATFORM_TEAM_ID = 'team-0000000000000000000000000';
function canDeleteTemplate(tmpl: AdminTemplate): boolean {
if (tmpl.name === 'minimal') return false;
if (tmpl.protected) return false;
return tmpl.team_id === PLATFORM_TEAM_ID;
}
@ -140,7 +140,7 @@
const result = await createBuild({
name: createForm.name.trim(),
base_template: createForm.base_template.trim() || 'minimal',
base_template: createForm.base_template.trim() || 'minimal-ubuntu',
recipe: lines,
healthcheck: createForm.healthcheck.trim() || undefined,
vcpus: createForm.vcpus,
@ -152,7 +152,7 @@
if (result.ok) {
showCreate = false;
createForm = { name: '', base_template: 'minimal', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null };
createForm = { name: '', base_template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null };
toast.success('Build queued');
goto(`/admin/templates/builds/${result.data.id}`);
} else {
@ -246,7 +246,7 @@
</p>
</div>
<button
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
class="group flex items-center gap-2.5 rounded-[var(--radius-button)] bg-[var(--color-accent)] px-5 py-2.5 text-ui font-semibold text-white shadow-sm transition-all duration-200 hover:shadow-[0_0_20px_var(--color-accent-glow-mid)] hover:brightness-115 hover:-translate-y-px active:translate-y-0"
>
<svg width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round" class="transition-transform duration-200 group-hover:rotate-90"><line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/></svg>
@ -397,7 +397,7 @@
</p>
{#if type === 'templates'}
<button
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
onclick={() => { showCreate = true; createError = null; createForm = { name: '', base_template: 'minimal-ubuntu', vcpus: 1, memory_mb: 512, recipe: '', healthcheck: '', skip_pre_post: false, run_as_root: false, archive: null }; }}
class="mt-6 flex items-center gap-2 rounded-[var(--radius-button)] border border-[var(--color-accent)]/30 bg-[var(--color-accent)]/10 px-4 py-2 text-ui font-medium text-[var(--color-accent-bright)] transition-all duration-200 hover:bg-[var(--color-accent)]/20 hover:border-[var(--color-accent)]/50"
>
<svg width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round" stroke-linejoin="round"><line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/></svg>
@ -476,7 +476,7 @@
<button
onclick={() => { deleteTarget = tmpl; deleteError = null; }}
disabled={!canDeleteTemplate(tmpl)}
title={tmpl.name === 'minimal' ? 'The minimal template cannot be deleted' : !canDeleteTemplate(tmpl) ? 'Cannot delete templates owned by other teams' : undefined}
title={tmpl.protected ? 'System base templates cannot be deleted' : !canDeleteTemplate(tmpl) ? 'Cannot delete templates owned by other teams' : undefined}
class="rounded-[var(--radius-button)] px-3 py-1.5 text-meta transition-all duration-150 {canDeleteTemplate(tmpl)
? 'text-[var(--color-text-tertiary)] hover:bg-[var(--color-red)]/10 hover:text-[var(--color-red)]'
: 'text-[var(--color-text-muted)] cursor-not-allowed opacity-40'}"

View File

@ -2,7 +2,8 @@
import { onMount } from 'svelte';
import Sidebar from '$lib/components/Sidebar.svelte';
import Toaster from '$lib/components/Toaster.svelte';
import { startSSE, stopSSE } from '$lib/sse.svelte';
import { startSSE, stopSSE, subscribeSSE } from '$lib/sse.svelte';
import { lifecycleToast } from '$lib/lifecycle-toasts';
let { children } = $props();
let collapsed = $state(
@ -13,7 +14,13 @@
onMount(() => {
startSSE();
return () => stopSSE();
// Lifecycle toasts live at the layout so they fire regardless of which
// dashboard page is open (and survive navigation between them).
const unsubscribe = subscribeSSE(lifecycleToast);
return () => {
unsubscribe();
stopSSE();
};
});
</script>

View File

@ -395,7 +395,7 @@
</div>
{:else}
{#each filteredCapsules as capsule, i (capsule.id)}
{@const isTransient = ['starting', 'resuming', 'pausing', 'stopping'].includes(capsule.status)}
{@const isTransient = ['starting', 'resuming', 'pausing', 'snapshotting', 'stopping'].includes(capsule.status)}
{@const stripeColor = capsule.status === 'running' ? 'bg-[var(--color-accent)]' : (capsule.status === 'paused' || capsule.status === 'hibernated') ? 'bg-[var(--color-amber)]' : isTransient ? 'bg-[var(--color-blue)]' : 'bg-[var(--color-text-muted)]'}
<div
class="capsule-row relative grid grid-cols-[1.6fr_0.8fr_0.5fr_0.5fr_0.6fr_1fr_0.9fr] items-center overflow-hidden border-b border-[var(--color-border)] transition-colors duration-150 hover:bg-[var(--color-bg-3)] last:border-b-0 {newCapsuleId === capsule.id ? 'capsule-born' : ''}"

View File

@ -434,7 +434,7 @@
case 'running': return 'var(--color-accent)';
case 'paused': return 'var(--color-amber)';
case 'error': return 'var(--color-red)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
return 'var(--color-blue)';
default: return 'var(--color-text-muted)';
}
@ -445,7 +445,7 @@
case 'running': return 'rgba(94,140,88,0.12)';
case 'paused': return 'rgba(212,167,60,0.12)';
case 'error': return 'rgba(207,129,114,0.12)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
return 'rgba(90,159,212,0.12)';
default: return 'rgba(255,255,255,0.05)';
}
@ -456,7 +456,7 @@
case 'running': return 'rgba(94,140,88,0.3)';
case 'paused': return 'rgba(212,167,60,0.3)';
case 'error': return 'rgba(207,129,114,0.3)';
case 'starting': case 'resuming': case 'pausing': case 'stopping':
case 'starting': case 'resuming': case 'pausing': case 'snapshotting': case 'stopping':
return 'rgba(90,159,212,0.3)';
default: return 'rgba(255,255,255,0.08)';
}

17
images/build-alpine.sh Executable file
View File

@ -0,0 +1,17 @@
#!/usr/bin/env bash
#
# build-alpine.sh — Build the minimal-alpine system base rootfs (template id 1).
#
# Usage: bash images/build-alpine.sh
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
# Alpine is musl-based: the static envd + static tini run fine. bash is added so
# wrenn-user has a familiar login shell; wrenn-init itself only needs /bin/sh.
PREP="set -e
apk add --no-cache socat chrony sudo wget curl ca-certificates git iproute2 tini bash
adduser -D wrenn-user
${WRENN_SUDOERS_SETUP}"
build_system_rootfs "alpine:3.22" 1 "${PREP}"

20
images/build-arch.sh Executable file
View File

@ -0,0 +1,20 @@
#!/usr/bin/env bash
#
# build-arch.sh — Build the minimal-arch system base rootfs (template id 2).
#
# Arch is rolling-release; archlinux:base is the minimal base group.
#
# Usage: bash images/build-arch.sh
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
# tini is AUR-only on Arch (not in core/extra), so it is not installed here —
# rootfs-from-container.sh injects the static tini binary instead.
PREP="set -e
pacman -Sy --noconfirm --needed socat chrony sudo wget curl ca-certificates git iproute2 inetutils
useradd -m -s /bin/bash wrenn-user
${WRENN_SUDOERS_SETUP}
pacman -Scc --noconfirm || true"
build_system_rootfs "archlinux:base" 2 "${PREP}"

59
images/build-common.sh Executable file
View File

@ -0,0 +1,59 @@
#!/usr/bin/env bash
#
# build-common.sh — shared helpers for building the system base rootfs images.
#
# Sourced by images/build-{ubuntu,alpine,arch,fedora}.sh. Each caller defines
# the distro base image, reserved template ID, and the in-container prep snippet
# (install packages + create wrenn-user), then calls build_system_rootfs.
#
# The same statically-linked envd + tini run on every distro; the per-OS prep
# only differs in the package manager and the user-creation command.
set -euo pipefail
# base36(all-zeros UUID) = the platform team that owns every system base
# template. Must match id.PlatformTeamID / id.UUIDToBase36 on the Go side.
PLATFORM_TEAM_B36="0000000000000000000000000"
# WRENN_SUDOERS_SETUP grants wrenn-user passwordless sudo. Identical on every
# distro; appended to each prep snippet after the user is created.
WRENN_SUDOERS_SETUP='echo "wrenn-user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/wrenn-user && chmod 0440 /etc/sudoers.d/wrenn-user'
# build_system_rootfs <base_image> <template_id_int> <prep_snippet>
#
# Spawns a throwaway container from base_image, runs prep_snippet inside it,
# then exports it to the system base template's on-disk path
# (images/teams/<platform>/<base36(id)>/rootfs.ext4) via rootfs-from-container.sh.
build_system_rootfs() {
local base_image="$1" template_id="$2" prep="$3"
local script_dir project_root container dest tmpl_b36
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
project_root="$(cd "${script_dir}/.." && pwd)"
container="wrenn-build-${template_id}-$$"
# base36(template_id). System IDs are single-digit (0-3), so base36 equals
# the decimal digit and the 25-char zero-padded decimal matches what
# id.UUIDToBase36 produces for these well-known IDs.
tmpl_b36="$(printf '%025d' "${template_id}")"
dest="teams/${PLATFORM_TEAM_B36}/${tmpl_b36}"
echo "==> Pulling ${base_image}..."
docker pull "${base_image}"
echo "==> Preparing container ${container}..."
docker rm -f "${container}" >/dev/null 2>&1 || true
# Arm cleanup before starting the container so a failed run still removes it.
# Expand the name into the trap now: it must survive after this function's
# locals go out of scope (set -u would error on a stale reference otherwise).
trap "docker rm -f '${container}' >/dev/null 2>&1 || true" EXIT
docker run --name "${container}" "${base_image}" /bin/sh -c "${prep}"
# Run the exporter as the normal user, NOT under sudo: it builds envd via
# `make build-envd` (needs cargo on the user's PATH) and uses sudo itself
# for the privileged mount/mkfs/copy steps.
echo "==> Exporting to images/${dest}/rootfs.ext4..."
bash "${project_root}/scripts/rootfs-from-container.sh" "${container}" "${dest}"
}

19
images/build-fedora.sh Executable file
View File

@ -0,0 +1,19 @@
#!/usr/bin/env bash
#
# build-fedora.sh — Build the minimal-fedora system base rootfs (template id 3).
#
# Usage: bash images/build-fedora.sh
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
# Fedora's iproute package provides `ip` (no "2" suffix, unlike Debian/Arch).
PREP="set -e
# install_weak_deps=False keeps the image lean. The guest never runs systemd:
# PID 1 is wrenn-init -> tini -> envd.
dnf install -y --setopt=install_weak_deps=False socat chrony sudo wget curl ca-certificates git iproute hostname tini
useradd -m -s /bin/bash wrenn-user
${WRENN_SUDOERS_SETUP}
dnf clean all"
build_system_rootfs "fedora:45" 3 "${PREP}"

View File

25
images/build-ubuntu.sh Executable file
View File

@ -0,0 +1,25 @@
#!/usr/bin/env bash
#
# build-ubuntu.sh — Build the minimal-ubuntu system base rootfs (template id 0).
#
# Usage: bash images/build-ubuntu.sh
set -euo pipefail
source "$(cd "$(dirname "$0")" && pwd)/build-common.sh"
PREP="set -e
export DEBIAN_FRONTEND=noninteractive
apt-get update
# --no-install-recommends keeps the image lean (avoids pulling systemd-adjacent
# recommends). The guest never runs systemd: PID 1 is wrenn-init -> tini -> envd.
apt-get install -y --no-install-recommends socat chrony sudo wget curl ca-certificates git iproute2 hostname tini
# Remove the stock 'ubuntu' user (uid 1000) shipped by the base image; it is
# replaced by wrenn-user. Also drop its cloud-init sudoers drop-in.
userdel -r ubuntu 2>/dev/null || true
rm -f /etc/sudoers.d/90-cloud-init-users
useradd -m -s /bin/bash wrenn-user
${WRENN_SUDOERS_SETUP}
apt-get clean
rm -rf /var/lib/apt/lists/*"
build_system_rootfs "ubuntu:26.04" 0 "${PREP}"

View File

@ -23,9 +23,11 @@ echo "+cpu +memory +io" > /sys/fs/cgroup/cgroup.subtree_control 2>/dev/null || t
{ echo 0 > /sys/block/vda/queue/write_zeroes_max_bytes; } 2>/dev/null || true
{ echo 0 > /sys/block/vda/queue/discard_max_bytes; } 2>/dev/null || true
# Set hostname and make it resolvable (sudo requires this).
hostname capsule
echo "127.0.0.1 capsule" >> /etc/hosts
# Set hostname and make it resolvable (sudo requires this). Use the kernel knob
# directly so we don't depend on the `hostname` binary, which is absent from
# minimal Arch/Fedora images. Guard so a failure never aborts init under set -e.
echo capsule > /proc/sys/kernel/hostname 2>/dev/null || hostname capsule 2>/dev/null || true
echo "127.0.0.1 capsule" >> /etc/hosts 2>/dev/null || true
# Configure networking if the kernel ip= boot arg did not already set it up.
if ! ip addr show eth0 2>/dev/null | grep -q "169.254.0.21"; then
@ -35,9 +37,14 @@ if ! ip addr show eth0 2>/dev/null | grep -q "169.254.0.21"; then
ip route add default via 169.254.0.22 2>/dev/null || true
fi
# Configure DNS resolver.
echo "nameserver 8.8.8.8" > /etc/resolv.conf
echo "nameserver 8.8.4.4" >> /etc/resolv.conf
# Configure DNS resolver. Drop any existing symlink first — on some distros
# (e.g. Fedora) /etc/resolv.conf is a dangling symlink into systemd-resolved,
# and writing through it would fail and abort init under set -e.
rm -f /etc/resolv.conf 2>/dev/null || true
{
echo "nameserver 8.8.8.8"
echo "nameserver 8.8.4.4"
} > /etc/resolv.conf 2>/dev/null || true
# Set a standard PATH so envd and all child processes can find common binaries.
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

View File

@ -137,22 +137,15 @@ func (h *adminCapsuleHandler) Snapshot(w http.ResponseWriter, r *http.Request) {
}
}
tmpl, err := h.svc.CreateSnapshot(r.Context(), sandboxID, id.PlatformTeamID, req.Name)
ac := auth.MustFromContext(r.Context())
ac.TeamID = id.PlatformTeamID
name := req.Name
if err == nil {
name = tmpl.Name
}
h.audit.LogSnapshotCreate(r.Context(), ac, name, err)
sb, name, err := h.svc.CreateSnapshot(r.Context(), sandboxID, id.PlatformTeamID, req.Name)
if err != nil {
if name != "" {
h.audit.LogSnapshotDeleteSystem(r.Context(), id.PlatformTeamID, name, "cleanup_after_create_error", nil)
}
status, code, msg := serviceErrToHTTP(err)
writeError(w, status, code, msg)
return
}
ac := auth.MustFromContext(r.Context())
ac.TeamID = id.PlatformTeamID
h.audit.LogSnapshotCreateRequested(r.Context(), ac, name)
writeJSON(w, http.StatusCreated, templateToResponse(tmpl))
writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
}

View File

@ -246,6 +246,7 @@ func (h *buildHandler) ListTemplates(w http.ResponseWriter, r *http.Request) {
SizeBytes int64 `json:"size_bytes"`
TeamID string `json:"team_id"`
CreatedAt string `json:"created_at"`
Protected bool `json:"protected"`
}
resp := make([]templateResponse, len(templates))
@ -257,6 +258,7 @@ func (h *buildHandler) ListTemplates(w http.ResponseWriter, r *http.Request) {
MemoryMB: t.MemoryMb,
SizeBytes: t.SizeBytes,
TeamID: id.FormatTeamID(t.TeamID),
Protected: layout.IsSystemTemplate(t.TeamID, t.ID),
}
if t.CreatedAt.Valid {
resp[i].CreatedAt = t.CreatedAt.Time.Format(time.RFC3339)
@ -280,8 +282,8 @@ func (h *buildHandler) DeleteTemplate(w http.ResponseWriter, r *http.Request) {
writeError(w, http.StatusNotFound, "not_found", "template not found")
return
}
if layout.IsMinimal(tmpl.TeamID, tmpl.ID) {
writeError(w, http.StatusForbidden, "forbidden", "the minimal template cannot be deleted")
if layout.IsSystemTemplate(tmpl.TeamID, tmpl.ID) {
writeError(w, http.StatusForbidden, "forbidden", "system base templates cannot be deleted")
return
}

View File

@ -158,6 +158,11 @@ func (h *sandboxEventHandler) verbForFailure(ctx context.Context, sandboxID pgty
return events.CapsuleResume
case "pausing":
return events.CapsulePause
case "snapshotting":
// A snapshot pauses then resumes the VM; a host-side failure leaves the
// sandbox errored, not destroyed. Route through CapsuleCreate so the
// consumer's handleFailed marks it "error" rather than removing the row.
return events.CapsuleCreate
default:
return events.CapsuleDestroy
}

View File

@ -79,6 +79,7 @@ type snapshotResponse struct {
SizeBytes int64 `json:"size_bytes"`
CreatedAt string `json:"created_at"`
Platform bool `json:"platform"`
Protected bool `json:"protected"`
Metadata map[string]string `json:"metadata,omitempty"`
}
@ -88,6 +89,7 @@ func templateToResponse(t db.Template) snapshotResponse {
Type: t.Type,
SizeBytes: t.SizeBytes,
Platform: t.TeamID == id.PlatformTeamID,
Protected: layout.IsSystemTemplate(t.TeamID, t.ID),
}
if t.Vcpus != 0 {
resp.VCPUs = &t.Vcpus
@ -112,8 +114,8 @@ type createSnapshotRequest struct {
Name string `json:"name"`
}
// Create handles POST /v1/snapshots. Takes a live snapshot of a running
// sandbox and registers the result as a new template.
// Create handles POST /v1/snapshots. Snapshots a running or paused sandbox and
// registers the result as a new template.
func (h *snapshotHandler) Create(w http.ResponseWriter, r *http.Request) {
var req createSnapshotRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
@ -131,22 +133,18 @@ func (h *snapshotHandler) Create(w http.ResponseWriter, r *http.Request) {
}
ac := auth.MustFromContext(r.Context())
tmpl, err := h.sandboxSvc.CreateSnapshot(r.Context(), sandboxID, ac.TeamID, req.Name)
name := req.Name
if err == nil {
name = tmpl.Name
}
h.audit.LogSnapshotCreate(r.Context(), ac, name, err)
// Async: the VM briefly pauses to a "snapshotting" state, then resumes. The
// template is registered by a background goroutine; clients learn of the
// result via the SSE template.snapshot.create event (or by polling).
sb, name, err := h.sandboxSvc.CreateSnapshot(r.Context(), sandboxID, ac.TeamID, req.Name)
if err != nil {
if name != "" {
h.audit.LogSnapshotDeleteSystem(r.Context(), ac.TeamID, name, "cleanup_after_create_error", nil)
}
status, code, msg := serviceErrToHTTP(err)
writeError(w, status, code, msg)
return
}
h.audit.LogSnapshotCreateRequested(r.Context(), ac, name)
writeJSON(w, http.StatusCreated, templateToResponse(tmpl))
writeJSON(w, http.StatusAccepted, sandboxToResponse(sb))
}
// List handles GET /v1/snapshots.
@ -188,8 +186,8 @@ func (h *snapshotHandler) Delete(w http.ResponseWriter, r *http.Request) {
writeError(w, http.StatusForbidden, "forbidden", "platform templates cannot be deleted here")
return
}
if layout.IsMinimal(tmpl.TeamID, tmpl.ID) {
writeError(w, http.StatusForbidden, "forbidden", "the minimal template cannot be deleted")
if layout.IsSystemTemplate(tmpl.TeamID, tmpl.ID) {
writeError(w, http.StatusForbidden, "forbidden", "system base templates cannot be deleted")
return
}

View File

@ -32,6 +32,14 @@ const unreachableThreshold = 90 * time.Second
// that may not have registered the sandbox on the host agent yet.
const transientGracePeriod = 2 * time.Minute
// snapshotGracePeriod is the grace for a sandbox stuck in "snapshotting" while
// the VM is still alive on the host. Snapshots dump guest RAM and flatten the
// rootfs, which can run for minutes on large sandboxes, and the agent reports
// the VM as alive throughout — so we must not race the in-flight operation.
// It exceeds the background goroutine's 10-minute deadline, so reaching it
// means the control plane crashed mid-snapshot and the sandbox needs recovery.
const snapshotGracePeriod = 15 * time.Minute
// HostMonitor runs on a fixed interval and performs two duties:
//
// 1. Passive check: marks hosts whose last_heartbeat_at is stale as
@ -350,7 +358,7 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
transientSandboxes, err := m.db.ListSandboxesByHostAndStatus(ctx, db.ListSandboxesByHostAndStatusParams{
HostID: host.ID,
Column2: []string{"starting", "resuming", "pausing", "stopping"},
Column2: []string{"starting", "resuming", "pausing", "stopping", "snapshotting"},
})
if err != nil {
slog.Warn("host monitor: failed to list transient sandboxes", "host_id", id.FormatHostID(host.ID), "error", err)
@ -359,7 +367,7 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
for _, sb := range transientSandboxes {
sbIDStr := id.FormatSandboxID(sb.ID)
if _, ok := aliveStatus[sbIDStr]; ok {
if agentStatus, ok := aliveStatus[sbIDStr]; ok {
// Sandbox is alive on host — the background goroutine should
// finalize the transition. For starting/resuming, if the sandbox
// is alive it means creation/resume succeeded.
@ -370,6 +378,26 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
slog.Info("host monitor: promoted transient sandbox to running", "sandbox_id", sbIDStr, "from", sb.Status)
}
}
// A snapshot keeps the source sandbox alive throughout, so an alive
// sandbox does NOT mean the snapshot finished. Only recover it once
// it has been stuck past the snapshot grace period (i.e. the CP
// crashed mid-op). Recover to the sandbox's actual host-side status:
// a running sandbox is snapshotted live and stays running, but a
// paused sandbox is snapshotted from disk and must return to paused.
if sb.Status == "snapshotting" &&
sb.LastUpdated.Valid && time.Since(sb.LastUpdated.Time) >= snapshotGracePeriod {
recoverTo := agentStatus
if recoverTo != "running" && recoverTo != "paused" {
// Coerced/unknown agent label — default to running.
recoverTo = "running"
}
if _, err := m.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sb.ID, Status: "snapshotting", Status_2: recoverTo,
}); err == nil {
slog.Info("host monitor: recovered stuck snapshotting sandbox", "sandbox_id", sbIDStr, "to", recoverTo)
m.audit.LogSnapshotCreateSystem(ctx, sb.TeamID, sb.ID, "snapshot_recovered", nil)
}
}
continue
}
// Sandbox is not alive on host. If the transition is recent, give the
@ -390,6 +418,9 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
finalStatus = "paused"
case "stopping":
finalStatus = "stopped"
case "snapshotting":
// VM is gone but DB says snapshotting → the snapshot died with the VM.
finalStatus = "error"
}
fromStatus := sb.Status
if _, err := m.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
@ -405,6 +436,9 @@ func (m *HostMonitor) checkHost(ctx context.Context, host db.Host) {
case "pausing":
// Pause assumed to have succeeded host-side; emit success with inferred metadata.
m.audit.LogSandboxAutoPause(ctx, sb.TeamID, sb.ID, "transient_timeout_inferred", nil)
case "snapshotting":
// VM gone mid-snapshot; the sandbox is errored.
m.audit.LogSnapshotCreateSystem(ctx, sb.TeamID, sb.ID, "transient_timeout", inferredErr)
case "stopping":
m.audit.LogSandboxDestroySystem(ctx, sb.TeamID, sb.ID, "transient_timeout_inferred", nil)
}

View File

@ -1421,10 +1421,19 @@ paths:
- apiKeyAuth: []
- sessionAuth: []
description: |
Live snapshot: briefly pauses the capsule, writes its VM state +
memory + flattened rootfs to a new template directory, then resumes
the capsule. The source capsule keeps running after the snapshot;
the resulting template can be used to create new capsules.
Snapshot a capsule, processed asynchronously. The call returns
immediately with the capsule in the `snapshotting` state, then it
returns to its original state on completion. The capsule must be
`running` or `paused`.
A `running` capsule is snapshotted live: it briefly pauses while its VM
state + memory + flattened rootfs are written to a new template, then
resumes to `running`. A `paused` capsule is snapshotted directly from
its on-disk state without reviving the VM, and stays `paused`.
Because it is async, the response does NOT contain the template. Watch
for the `template.snapshot.create` SSE event (its `outcome` reports
success or failure) or poll `GET /v1/snapshots` to observe completion.
Snapshots are immutable: each call must use a fresh name. Re-using
an existing name returns 409 Conflict.
@ -1435,14 +1444,14 @@ paths:
schema:
$ref: "#/components/schemas/CreateSnapshotRequest"
responses:
"201":
description: Snapshot created
"202":
description: Snapshot accepted; capsule is now snapshotting
content:
application/json:
schema:
$ref: "#/components/schemas/Template"
$ref: "#/components/schemas/Capsule"
"409":
description: Name already exists or capsule not running
description: Name already exists, or capsule is not running or paused
content:
application/json:
schema:
@ -2813,7 +2822,7 @@ paths:
schema:
type: array
items:
$ref: "#/components/schemas/Template"
$ref: "#/components/schemas/AdminTemplate"
/v1/admin/templates/{name}:
delete:
@ -2989,6 +2998,10 @@ paths:
summary: Create snapshot from any capsule (admin)
operationId: adminCreateSnapshotFromCapsule
tags: [admin]
description: |
Snapshots a `running` or `paused` capsule into a platform template,
processed asynchronously (see `POST /v1/snapshots`). A running capsule
resumes to `running`; a paused capsule stays `paused`.
security:
- sessionAuth: []
parameters:
@ -2997,21 +3010,22 @@ paths:
required: true
schema: {type: string}
requestBody:
required: true
required: false
content:
application/json:
schema:
type: object
required: [name]
properties:
name: {type: string}
name:
type: string
description: Optional; an auto-generated name is used when omitted.
responses:
"201":
description: Snapshot created
"202":
description: Snapshot accepted; capsule is now snapshotting
content:
application/json:
schema:
$ref: "#/components/schemas/Template"
$ref: "#/components/schemas/Capsule"
/v1/admin/capsules/{id}/exec:
parameters:
@ -3506,7 +3520,7 @@ components:
properties:
template:
type: string
default: minimal
default: minimal-ubuntu
vcpus:
type: integer
default: 1
@ -3610,7 +3624,7 @@ components:
type: string
status:
type: string
enum: [pending, starting, running, pausing, paused, resuming, stopping, hibernated, stopped, missing, error]
enum: [pending, starting, running, pausing, paused, snapshotting, resuming, stopping, hibernated, stopped, missing, error]
template:
type: string
vcpus:
@ -3684,13 +3698,51 @@ components:
type: boolean
description: |
True when the template is platform-managed (visible to all teams,
e.g. the built-in `minimal` rootfs). False for team-owned
e.g. the built-in `minimal-ubuntu` rootfs). False for team-owned
snapshot templates.
protected:
type: boolean
description: |
True for built-in system base templates (minimal-ubuntu,
minimal-alpine, minimal-arch, minimal-fedora). Protected templates
cannot be deleted.
metadata:
type: object
additionalProperties: {type: string}
nullable: true
AdminTemplate:
type: object
description: |
Template as returned by the admin templates list. Unlike `Template`
(the team-facing snapshot shape), this includes the owning `team_id`
and omits `platform`/`metadata`.
properties:
name:
type: string
type:
type: string
enum: [base, snapshot]
vcpus:
type: integer
memory_mb:
type: integer
size_bytes:
type: integer
format: int64
team_id:
type: string
description: Owning team ID (formatted, e.g. `team-…`). Platform team for global templates.
created_at:
type: string
format: date-time
protected:
type: boolean
description: |
True for built-in system base templates (minimal-ubuntu,
minimal-alpine, minimal-arch, minimal-fedora). Protected templates
cannot be deleted.
ExecRequest:
type: object
required: [cmd]

View File

@ -266,7 +266,7 @@ func (c *SandboxEventConsumer) handleStopped(ctx context.Context, sandboxID pgty
// audit.Log writes the row only — it does NOT republish an event, which would
// loop back into this consumer. Do not switch to LogSandboxCreateSystem here.
func (c *SandboxEventConsumer) handleFailed(ctx context.Context, sandboxID pgtype.UUID, event events.Event) {
for _, fromStatus := range []string{"running", "starting", "pausing", "resuming"} {
for _, fromStatus := range []string{"running", "starting", "pausing", "resuming", "snapshotting"} {
if _, err := c.db.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: fromStatus, Status_2: "error",
}); err == nil {

View File

@ -83,7 +83,13 @@ func New(
sandboxSvc := &service.SandboxService{DB: queries, Pool: pool, Scheduler: sched}
sandboxSvc.PublishEvent = func(ctx context.Context, event service.SandboxStateEvent) {
if evt, ok := serviceEventToCanonical(event); ok {
eventPub.Publish(ctx, evt)
// State-change events are ephemeral UI signals — mirror them to the
// dashboard via Pub/Sub only, never to durable channel subscribers.
if evt.Event == events.CapsuleStateChanged {
eventPub.PublishTransient(ctx, evt)
} else {
eventPub.Publish(ctx, evt)
}
}
}
apiKeySvc := &service.APIKeyService{DB: queries}
@ -482,6 +488,39 @@ func serviceEventToCanonical(e service.SandboxStateEvent) (events.Event, bool) {
eventType = events.CapsuleCreate
outcome = events.OutcomeError
metadata = map[string]string{"reason": "create_failed"}
case "sandbox.snapshotted":
// Completion of an async snapshot. The resource is the template name,
// not the sandbox, so the dashboard's snapshot list refreshes.
return events.Event{
Event: events.SnapshotCreate,
Outcome: events.OutcomeSuccess,
Timestamp: events.Now(),
TeamID: e.TeamID,
Actor: events.SystemActor(),
Resource: events.Resource{ID: e.Metadata["name"], Type: "snapshot"},
}, true
case "sandbox.snapshot_failed":
return events.Event{
Event: events.SnapshotCreate,
Outcome: events.OutcomeError,
Timestamp: events.Now(),
TeamID: e.TeamID,
Actor: events.SystemActor(),
Resource: events.Resource{ID: e.Metadata["name"], Type: "snapshot"},
Metadata: map[string]string{"reason": "snapshot_failed"},
Error: e.Error,
}, true
case "sandbox.state_changed":
// Transient badge transition with no terminal verb of its own. Carries
// from/to in metadata; routed via Pub/Sub only by the caller.
return events.Event{
Event: events.CapsuleStateChanged,
Timestamp: events.Now(),
TeamID: e.TeamID,
Actor: events.SystemActor(),
Resource: events.Resource{ID: e.SandboxID, Type: "sandbox"},
Metadata: e.Metadata,
}, true
default:
return events.Event{}, false
}

View File

@ -15,20 +15,19 @@ import (
func timeNowNano() int64 { return time.Now().UnixNano() }
// IsMinimal reports whether the given team and template IDs represent the
// built-in "minimal" template (both all-zeros).
func IsMinimal(teamID, templateID pgtype.UUID) bool {
return teamID.Bytes == id.PlatformTeamID.Bytes && templateID.Bytes == id.MinimalTemplateID.Bytes
// IsSystemTemplate reports whether the given team and template IDs represent a
// built-in system base template (minimal-ubuntu / -alpine / -arch / -fedora):
// platform-owned with a template ID in the reserved range. System templates are
// protected from deletion.
func IsSystemTemplate(teamID, templateID pgtype.UUID) bool {
return teamID.Bytes == id.PlatformTeamID.Bytes && id.IsReservedTemplateID(templateID)
}
// TemplateDir returns the on-disk directory for a template.
// TemplateDir returns the on-disk directory for a template. Every template —
// including the built-in system base templates — lives under the teams tree:
//
// minimal (zeros, zeros): {wrennDir}/images/minimal
// all others: {wrennDir}/images/teams/{base36(teamID)}/{base36(templateID)}
// {wrennDir}/images/teams/{base36(teamID)}/{base36(templateID)}
func TemplateDir(wrennDir string, teamID, templateID pgtype.UUID) string {
if IsMinimal(teamID, templateID) {
return filepath.Join(wrennDir, "images", "minimal")
}
return filepath.Join(wrennDir, "images", "teams",
id.UUIDToBase36(teamID.Bytes),
id.UUIDToBase36(templateID.Bytes))

View File

@ -9,7 +9,7 @@ import (
"git.omukk.dev/wrenn/wrenn/pkg/id"
)
func TestIsMinimal(t *testing.T) {
func TestIsSystemTemplate(t *testing.T) {
tests := []struct {
name string
teamID pgtype.UUID
@ -17,35 +17,41 @@ func TestIsMinimal(t *testing.T) {
want bool
}{
{
name: "both zeros",
name: "ubuntu (zeros, zeros)",
teamID: id.PlatformTeamID,
templateID: id.MinimalTemplateID,
templateID: id.UbuntuTemplateID,
want: true,
},
{
name: "non-zero team",
teamID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}, Valid: true},
templateID: id.MinimalTemplateID,
want: false,
},
{
name: "non-zero template",
name: "fedora (platform, id 3)",
teamID: id.PlatformTeamID,
templateID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}, Valid: true},
templateID: id.FedoraTemplateID,
want: true,
},
{
name: "platform, max reserved id",
teamID: id.PlatformTeamID,
templateID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x04, 0x00}, Valid: true}, // 1024
want: true,
},
{
name: "platform, above reserved range",
teamID: id.PlatformTeamID,
templateID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x04, 0x01}, Valid: true}, // 1025
want: false,
},
{
name: "both non-zero",
teamID: pgtype.UUID{Bytes: [16]byte{1}, Valid: true},
templateID: pgtype.UUID{Bytes: [16]byte{2}, Valid: true},
name: "non-platform team, reserved id",
teamID: pgtype.UUID{Bytes: [16]byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}, Valid: true},
templateID: id.UbuntuTemplateID,
want: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := IsMinimal(tt.teamID, tt.templateID); got != tt.want {
t.Errorf("IsMinimal() = %v, want %v", got, tt.want)
if got := IsSystemTemplate(tt.teamID, tt.templateID); got != tt.want {
t.Errorf("IsSystemTemplate() = %v, want %v", got, tt.want)
}
})
}
@ -54,9 +60,11 @@ func TestIsMinimal(t *testing.T) {
func TestTemplateDir(t *testing.T) {
wrennDir := "/var/lib/wrenn"
t.Run("minimal", func(t *testing.T) {
got := TemplateDir(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
want := filepath.Join(wrennDir, "images", "minimal")
t.Run("system base template (ubuntu) lives under teams", func(t *testing.T) {
got := TemplateDir(wrennDir, id.PlatformTeamID, id.UbuntuTemplateID)
want := filepath.Join(wrennDir, "images", "teams",
id.UUIDToBase36(id.PlatformTeamID.Bytes),
id.UUIDToBase36(id.UbuntuTemplateID.Bytes))
if got != want {
t.Errorf("TemplateDir() = %q, want %q", got, want)
}
@ -88,8 +96,11 @@ func TestTemplateDir(t *testing.T) {
func TestTemplateRootfs(t *testing.T) {
wrennDir := "/var/lib/wrenn"
got := TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
want := filepath.Join(wrennDir, "images", "minimal", "rootfs.ext4")
got := TemplateRootfs(wrennDir, id.PlatformTeamID, id.UbuntuTemplateID)
want := filepath.Join(wrennDir, "images", "teams",
id.UUIDToBase36(id.PlatformTeamID.Bytes),
id.UUIDToBase36(id.UbuntuTemplateID.Bytes),
"rootfs.ext4")
if got != want {
t.Errorf("TemplateRootfs() = %q, want %q", got, want)
}

View File

@ -9,12 +9,13 @@ import (
type SandboxStatus string
const (
StatusPending SandboxStatus = "pending"
StatusRunning SandboxStatus = "running"
StatusPausing SandboxStatus = "pausing"
StatusPaused SandboxStatus = "paused"
StatusStopped SandboxStatus = "stopped"
StatusError SandboxStatus = "error"
StatusPending SandboxStatus = "pending"
StatusRunning SandboxStatus = "running"
StatusPausing SandboxStatus = "pausing"
StatusPaused SandboxStatus = "paused"
StatusSnapshotting SandboxStatus = "snapshotting"
StatusStopped SandboxStatus = "stopped"
StatusError SandboxStatus = "error"
)
// Sandbox holds all state for a running sandbox on this host.

View File

@ -9,6 +9,8 @@ import (
"strconv"
"strings"
"github.com/jackc/pgx/v5/pgtype"
"git.omukk.dev/wrenn/wrenn/internal/layout"
"git.omukk.dev/wrenn/wrenn/pkg/id"
)
@ -29,13 +31,9 @@ func EnsureImageSizes(wrennDir string, targetMB int) error {
}
targetBytes := int64(targetMB) * 1024 * 1024
// Expand the built-in minimal image.
minimalRootfs := layout.TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
if err := expandImage(minimalRootfs, targetBytes, targetMB); err != nil {
return err
}
// Walk teams/{teamDir}/{templateDir}/rootfs.ext4 two levels deep.
// Walk teams/{teamDir}/{templateDir}/rootfs.ext4 two levels deep. The
// built-in system base templates live under teams/{base36(0)}/... so this
// covers them too.
teamsDir := layout.TeamsDir(wrennDir)
teamEntries, err := os.ReadDir(teamsDir)
if err != nil {
@ -104,12 +102,19 @@ func ParseSizeToMB(s string) (int, error) {
}
}
// ShrinkMinimalImage shrinks the built-in minimal rootfs back to its minimum
// size using resize2fs -M. This is the inverse of EnsureImageSizes and should
// be called during graceful shutdown so the image is stored compactly on disk.
func ShrinkMinimalImage(wrennDir string) {
minimalRootfs := layout.TemplateRootfs(wrennDir, id.PlatformTeamID, id.MinimalTemplateID)
shrinkImage(minimalRootfs)
// ShrinkSystemImages shrinks the built-in system base rootfs images back to
// their minimum size using resize2fs -M. This is the inverse of
// EnsureImageSizes and should be called during graceful shutdown so the images
// are stored compactly on disk.
func ShrinkSystemImages(wrennDir string) {
for _, tmplID := range []pgtype.UUID{
id.UbuntuTemplateID,
id.AlpineTemplateID,
id.ArchTemplateID,
id.FedoraTemplateID,
} {
shrinkImage(layout.TemplateRootfs(wrennDir, id.PlatformTeamID, tmplID))
}
}
// shrinkImage shrinks a single rootfs image to its minimum size.

View File

@ -294,12 +294,12 @@ func (m *Manager) Create(
// Snapshot template? Route to the CH-restore path; the launcher manages
// its own resource lifecycle and registers the sandbox itself.
//
// The minimal base template never carries a memory snapshot; guarding
// here prevents a stray state.json (e.g. from a failed CreateSnapshot
// that mis-targeted minimal) from silently rerouting fresh boots into
// System base templates never carry a memory snapshot; guarding here
// prevents a stray state.json (e.g. from a failed CreateSnapshot that
// mis-targeted a base template) from silently rerouting fresh boots into
// the restore path with a confusing error downstream.
templateDir := layout.TemplateDir(m.cfg.WrennDir, teamID, templateID)
if !layout.IsMinimal(teamID, templateID) && layout.IsSnapshotTemplate(templateDir) {
if !layout.IsSystemTemplate(teamID, templateID) && layout.IsSnapshotTemplate(templateDir) {
return m.createFromSnapshotTemplate(ctx, sandboxID, teamID, templateID,
vcpus, memoryMB, timeoutSec, diskSizeMB, defaultUser, defaultEnv)
}

View File

@ -32,6 +32,7 @@ import (
"fmt"
"log/slog"
"os"
"os/exec"
"path/filepath"
"strconv"
"strings"
@ -695,11 +696,12 @@ func (m *Manager) waitForMemoryLoader(ctx context.Context, sb *sandboxState) err
}
// CreateSnapshot writes a self-contained template snapshot to
// WRENN_DIR/images/teams/{teamID}/{templateID}/. The sandbox is briefly
// paused, the dm-snapshot is flattened into rootfs.ext4, CH writes the
// memory snapshot, then the sandbox is resumed.
// WRENN_DIR/images/teams/{teamID}/{templateID}/, then returns the total size
// (in bytes) of the artefacts written.
//
// Returns the total size (in bytes) of the artefacts written.
// A running sandbox is snapshotted live (briefly paused, memory dumped, rootfs
// flattened, then resumed). A paused sandbox is snapshotted straight from its
// on-disk pause artefacts without reviving the VM — it stays paused.
func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID, templateID pgtype.UUID, name string) (int64, error) {
sb, err := m.get(sandboxID)
if err != nil {
@ -709,10 +711,6 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID,
sb.lifecycleMu.Lock()
defer sb.lifecycleMu.Unlock()
if sb.Status != models.StatusRunning {
return 0, fmt.Errorf("%w: %s (status: %s)", ErrNotRunning, sandboxID, sb.Status)
}
// Refuse silent overwrites: every snapshot must land in a fresh
// templateID. Defends against caller bugs and concurrent CreateSnapshot
// races for the same destination. User-facing snapshot-name uniqueness
@ -722,6 +720,22 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID,
id.UUIDString(teamID), id.UUIDString(templateID))
}
switch sb.Status {
case models.StatusRunning:
return m.snapshotRunningToTemplate(ctx, sb, teamID, templateID, name)
case models.StatusPaused:
return m.snapshotPausedToTemplate(ctx, sb, teamID, templateID, name)
default:
return 0, fmt.Errorf("%w: %s (status: %s)", ErrNotRunning, sandboxID, sb.Status)
}
}
// snapshotRunningToTemplate takes a live snapshot of a running sandbox: pause
// CH, dump memory + flatten the rootfs into a staging dir, resume CH, then
// promote the staged template into place. The sandbox returns to running.
func (m *Manager) snapshotRunningToTemplate(ctx context.Context, sb *sandboxState, teamID, templateID pgtype.UUID, name string) (int64, error) {
sandboxID := sb.ID
// Same rationale as Pause: wait for the background memory loader so the
// resulting memory-ranges is self-contained when this sandbox itself was
// previously restored from an ondemand snapshot.
@ -821,6 +835,152 @@ func (m *Manager) CreateSnapshot(ctx context.Context, sandboxID string, teamID,
return size, nil
}
// snapshotPausedToTemplate builds a self-contained template from a paused
// sandbox's on-disk artefacts without reviving the VM. The pause snapshot
// already holds a self-contained CH memory image (Pause blocks on the memory
// loader before snapshotting), so we copy those memory files verbatim and
// flatten the persistent CoW into rootfs.ext4. The sandbox stays Paused.
func (m *Manager) snapshotPausedToTemplate(ctx context.Context, sb *sandboxState, teamID, templateID pgtype.UUID, name string) (int64, error) {
snapDir := layout.PauseSnapshotDir(m.cfg.WrennDir, sb.ID)
meta, err := readSnapshotMeta(snapDir)
if err != nil {
return 0, fmt.Errorf("load pause snapshot meta: %w", err)
}
dstDir := layout.TemplateDir(m.cfg.WrennDir, teamID, templateID)
stageDir := filepath.Join(layout.SandboxesDir(m.cfg.WrennDir),
fmt.Sprintf(".stage-%s-%d", sb.ID, time.Now().UnixNano()))
if err := os.MkdirAll(stageDir, 0o755); err != nil {
return 0, fmt.Errorf("mkdir stage dir: %w", err)
}
defer os.RemoveAll(stageDir)
// Flatten the persistent CoW into a standalone rootfs.ext4. The VM is down,
// so re-attach a throwaway dm-snapshot over the base image + CoW just long
// enough to read through it; the CoW file is left intact for a later Resume.
if err := m.flattenPausedCow(ctx, sb.ID, meta, filepath.Join(stageDir, "rootfs.ext4")); err != nil {
return 0, err
}
// Copy CH's memory snapshot files verbatim (state.json, config.json,
// memory-ranges, …) — everything except the CoW and the pause meta, which
// the template replaces with its own rootfs.ext4 and meta below.
if err := copyMemorySnapshotFiles(snapDir, stageDir); err != nil {
return 0, err
}
// Template meta: no SlotIndex (a template allocates a fresh slot per launch);
// SandboxDir + BaseTemplate carried forward so the restore path resolves the
// tmpfs disk path baked into CH's config.json.
tmplMeta := &snapshotMeta{
TemplateName: name,
TeamID: id.UUIDString(teamID),
TemplateID: id.UUIDString(templateID),
VCPUs: meta.VCPUs,
MemoryMB: meta.MemoryMB,
TimeoutSec: meta.TimeoutSec,
BaseTemplate: meta.BaseTemplate,
SandboxDir: meta.SandboxDir,
CreatedAt: time.Now(),
}
if err := writeSnapshotMeta(stageDir, tmplMeta); err != nil {
slog.Warn("template meta write failed", "id", sb.ID, "error", err)
}
if err := promoteSnapshotDir(stageDir, dstDir); err != nil {
return 0, fmt.Errorf("promote snapshot: %w", err)
}
size, err := snapshot.DirSize(dstDir, "")
if err != nil {
slog.Warn("snapshot size calc failed", "id", sb.ID, "error", err)
}
slog.Info("paused snapshot created",
"id", sb.ID,
"team_id", teamID,
"template_id", templateID,
"dir", dstDir,
"bytes", size,
)
return size, nil
}
// flattenPausedCow re-attaches a temporary dm-snapshot over a paused sandbox's
// base image + persistent CoW, flattens it into outPath, then tears the dm
// device down. The CoW file is preserved (RemoveSnapshot never deletes it) so a
// later Resume still works. A distinct dm name avoids colliding with the
// "wrenn-{id}" device a concurrent Resume would create — though lifecycleMu
// already serialises the two.
func (m *Manager) flattenPausedCow(ctx context.Context, sandboxID string, meta *snapshotMeta, outPath string) error {
originLoop, err := m.loops.Acquire(meta.BaseTemplate)
if err != nil {
return fmt.Errorf("acquire loop: %w", err)
}
defer m.loops.Release(meta.BaseTemplate)
originSize, err := devicemapper.OriginSizeBytes(originLoop)
if err != nil {
return fmt.Errorf("origin size: %w", err)
}
dmDev, err := devicemapper.RestoreSnapshot(ctx, "wrenn-flat-"+sandboxID, originLoop, meta.CowPath, originSize)
if err != nil {
return fmt.Errorf("restore dm-snapshot: %w", err)
}
defer func() {
if rerr := devicemapper.RemoveSnapshot(context.Background(), dmDev); rerr != nil {
slog.Warn("dm remove after paused flatten", "id", sandboxID, "error", rerr)
}
}()
if err := devicemapper.FlattenSnapshot(dmDev.DevicePath, outPath); err != nil {
return fmt.Errorf("flatten rootfs: %w", err)
}
return nil
}
// copyMemorySnapshotFiles copies every regular file from a pause snapshot dir
// into dstDir except the CoW and the wrenn meta — i.e. CH's own memory snapshot
// artefacts (state.json, config.json, memory-ranges, …). It hardlinks when the
// dirs share a filesystem (instant, preserves sparseness) and falls back to a
// sparse-preserving copy across filesystems. Pause never mutates these files in
// place — the next Pause writes a fresh dir and swaps — so a hardlink stays a
// valid, immutable view for the template.
func copyMemorySnapshotFiles(srcDir, dstDir string) error {
entries, err := os.ReadDir(srcDir)
if err != nil {
return fmt.Errorf("read pause dir: %w", err)
}
for _, e := range entries {
if e.IsDir() {
continue
}
name := e.Name()
if name == layout.SandboxCowName || name == snapshotMetaFile {
continue
}
if err := linkOrCopyFile(filepath.Join(srcDir, name), filepath.Join(dstDir, name)); err != nil {
return fmt.Errorf("copy %s: %w", name, err)
}
}
return nil
}
// linkOrCopyFile hardlinks from→to, falling back to a sparse-preserving copy
// when the two paths live on different filesystems (os.Link returns EXDEV). A
// plain byte copy would materialise the zero pages punched out of memory-ranges
// — inflating a multi-GB snapshot to its full apparent size — so the fallback
// uses `cp --sparse=always`, which re-detects and re-punches the holes.
func linkOrCopyFile(from, to string) error {
if err := os.Link(from, to); err == nil {
return nil
}
if out, err := exec.Command("cp", "--sparse=always", from, to).CombinedOutput(); err != nil {
return fmt.Errorf("sparse copy: %s: %w", string(out), err)
}
return nil
}
// DeleteSnapshot removes a template snapshot directory. Refuses deletion
// while any in-memory sandbox is still derived from this template — even
// though Linux unlink lets the open loop device keep working, the agent
@ -983,9 +1143,10 @@ func (m *Manager) PauseAll(ctx context.Context) {
wg.Wait()
}
// CleanupOrphanPauseDirs removes leftover *.staging-* and *.trash-* dirs
// under sandboxes/ from any Pause that crashed before completing the swap.
// Safe to call at agent startup before any sandbox is created or restored.
// CleanupOrphanPauseDirs removes leftover *.staging-*, *.stage-*, and *.trash-*
// dirs under sandboxes/ from any Pause/snapshot/flatten that crashed before
// completing its swap or promote. Safe to call at agent startup before any
// sandbox is created or restored.
//
// Per-sandbox cleanup happens implicitly during Destroy (which removes the
// whole PauseSnapshotDir) — this function only handles agent-crash orphans.
@ -1001,7 +1162,12 @@ func CleanupOrphanPauseDirs(wrennDir string) {
continue
}
name := e.Name()
if !strings.Contains(name, ".staging-") && !strings.Contains(name, ".trash-") {
// ".stage-" is the prefix used by snapshot/flatten staging dirs;
// ".staging-" + ".trash-" are used by Pause's swap. (".stage-" is not a
// substring of ".staging-", so all three need an explicit check.)
if !strings.Contains(name, ".stage-") &&
!strings.Contains(name, ".staging-") &&
!strings.Contains(name, ".trash-") {
continue
}
path := filepath.Join(sandboxesDir, name)

View File

@ -390,16 +390,25 @@ func (l *AuditLogger) LogSandboxStateChanged(ctx context.Context, teamID, sandbo
// --- Snapshot events (scope: team) ---
func (l *AuditLogger) LogSnapshotCreate(ctx context.Context, ac auth.AuthContext, name string, err error) {
l.Log(ctx, newEntry(ac, ac.TeamID, "team", "snapshot", name, "create", auditStatusFor(err, "success"), mergeMeta(nil, err)))
l.publish(ctx, events.Event{
Event: events.SnapshotCreate,
Outcome: outcomeFromErr(err),
Timestamp: events.Now(),
TeamID: id.FormatTeamID(ac.TeamID),
Actor: actorToEvent(ac),
Resource: events.Resource{ID: name, Type: "snapshot"},
Error: errString(err),
// LogSnapshotCreateRequested records that a user requested an async snapshot.
// It writes the user-attributed audit row only — the terminal success/failure
// event is published later by the background goroutine (system actor). Mirrors
// the accept-time audit pattern used by LogSandboxPause.
func (l *AuditLogger) LogSnapshotCreateRequested(ctx context.Context, ac auth.AuthContext, name string) {
l.Log(ctx, newEntry(ac, ac.TeamID, "team", "snapshot", name, "create", "success", nil))
}
// LogSnapshotCreateSystem records a system-actor snapshot transition inferred
// by a reconciler (e.g. the HostMonitor recovering or failing a sandbox stuck
// in "snapshotting"). It writes an audit row only and does NOT publish a
// SnapshotCreate event: the reconciler has no template name, and emitting one
// would surface a spurious "snapshot captured/failed" toast.
func (l *AuditLogger) LogSnapshotCreateSystem(ctx context.Context, teamID, sandboxID pgtype.UUID, reason string, err error) {
l.Log(ctx, Entry{
TeamID: teamID, ActorType: "system",
ResourceType: "sandbox", ResourceID: id.FormatSandboxID(sandboxID),
Action: "snapshot", Scope: "team", Status: auditStatusFor(err, "info"),
Metadata: mergeMeta(map[string]any{"reason": reason}, err),
})
}

View File

@ -2,6 +2,7 @@ package id
import (
"crypto/rand"
"encoding/binary"
"encoding/hex"
"fmt"
"math/big"
@ -156,10 +157,37 @@ func ParseChannelID(s string) (pgtype.UUID, error) { return parseUUID(PrefixCh
// (e.g. base templates, shared infrastructure).
var PlatformTeamID = pgtype.UUID{Bytes: [16]byte{}, Valid: true}
// MinimalTemplateID is the all-zeros UUID sentinel for the built-in "minimal"
// template. When both team_id and template_id are zero, the host agent uses
// the minimal rootfs at WRENN_DIR/images/minimal/.
var MinimalTemplateID = pgtype.UUID{Bytes: [16]byte{}, Valid: true}
// SystemTemplateMaxID is the highest template ID reserved for built-in system
// base templates. Template IDs in [0, SystemTemplateMaxID] under the platform
// team are protected: they cannot be deleted and live at the well-known
// teams/{base36(0)}/{base36(id)} on-disk paths.
const SystemTemplateMaxID = 1024
// templateID returns the all-zeros UUID with its low 64 bits set to n. Used to
// mint the well-known IDs for the built-in system base templates.
func templateID(n uint64) pgtype.UUID {
var b [16]byte
binary.BigEndian.PutUint64(b[8:], n)
return pgtype.UUID{Bytes: b, Valid: true}
}
// Well-known system base template IDs (platform team). The on-disk rootfs for
// each lives at WRENN_DIR/images/teams/{base36(PlatformTeamID)}/{base36(id)}/.
var (
UbuntuTemplateID = templateID(0) // minimal-ubuntu (replaces the old "minimal")
AlpineTemplateID = templateID(1) // minimal-alpine
ArchTemplateID = templateID(2) // minimal-arch
FedoraTemplateID = templateID(3) // minimal-fedora
)
// IsReservedTemplateID reports whether t falls in the reserved system template
// ID range [0, SystemTemplateMaxID] (i.e. the top 64 bits are zero and the
// bottom 64 bits are <= SystemTemplateMaxID).
func IsReservedTemplateID(t pgtype.UUID) bool {
hi := binary.BigEndian.Uint64(t.Bytes[:8])
lo := binary.BigEndian.Uint64(t.Bytes[8:])
return hi == 0 && lo <= SystemTemplateMaxID
}
// UUIDString converts a pgtype.UUID to a standard hyphenated UUID string
// (e.g., "6ba7b810-9dad-11d1-80b4-00c04fd430c8"). Used for RPC wire format.

View File

@ -106,7 +106,7 @@ func (s *BuildService) takeArchive(buildID string) []byte {
// Create inserts a new build record and enqueues it to Redis.
func (s *BuildService) Create(ctx context.Context, p BuildCreateParams) (db.TemplateBuild, error) {
if p.BaseTemplate == "" {
p.BaseTemplate = "minimal"
p.BaseTemplate = "minimal-ubuntu"
}
if p.VCPUs <= 0 {
p.VCPUs = 1
@ -447,17 +447,15 @@ func (s *BuildService) provisionBuildSandbox(
sandboxIDStr := id.FormatSandboxID(sandboxID)
log.Info("provisioning build sandbox", "sandbox_id", sandboxIDStr, "host_id", id.FormatHostID(host.ID))
baseTeamID := id.PlatformTeamID
baseTemplateID := id.MinimalTemplateID
if build.BaseTemplate != "minimal" {
baseTmpl, err := s.DB.GetPlatformTemplateByName(ctx, build.BaseTemplate)
if err != nil {
s.failBuild(ctx, buildID, fmt.Sprintf("base template %q not found: %v", build.BaseTemplate, err))
return nil, "", nil, err
}
baseTeamID = baseTmpl.TeamID
baseTemplateID = baseTmpl.ID
// All base templates — including the built-in system ones — are
// platform-owned rows, so resolve the path from the DB record.
baseTmpl, err := s.DB.GetPlatformTemplateByName(ctx, build.BaseTemplate)
if err != nil {
s.failBuild(ctx, buildID, fmt.Sprintf("base template %q not found: %v", build.BaseTemplate, err))
return nil, "", nil, err
}
baseTeamID := baseTmpl.TeamID
baseTemplateID := baseTmpl.ID
resp, err := agent.CreateSandbox(ctx, connect.NewRequest(&pb.CreateSandboxRequest{
SandboxId: sandboxIDStr,
@ -481,6 +479,23 @@ func (s *BuildService) provisionBuildSandbox(
HostID: host.ID,
})
if _, err := s.DB.InsertSandbox(ctx, db.InsertSandboxParams{
ID: sandboxID,
TeamID: id.PlatformTeamID,
HostID: host.ID,
Template: build.BaseTemplate,
Status: "running",
Vcpus: build.Vcpus,
MemoryMb: build.MemoryMb,
TimeoutSec: 0,
DiskSizeMb: 5120,
TemplateID: baseTemplateID,
TemplateTeamID: baseTeamID,
Metadata: []byte("{}"),
}); err != nil {
log.Warn("failed to insert builder sandbox record", "error", err)
}
archive := s.takeArchive(buildIDStr)
if len(archive) > 0 {
if err := s.uploadAndExtractArchive(ctx, agent, sandboxIDStr, archive, buildIDStr); err != nil {
@ -602,6 +617,7 @@ func (s *BuildService) finalizeBuild(
}
s.publishStatus(ctx, buildID, "success", build.TotalSteps, build.TotalSteps, "")
s.destroySandbox(ctx, agent, sandboxIDStr)
log.Info("template build completed successfully", "name", build.Name)
}
@ -796,6 +812,13 @@ func (s *BuildService) destroySandbox(_ context.Context, agent buildAgentClient,
})); err != nil {
slog.Warn("failed to destroy build sandbox", "sandbox_id", sandboxIDStr, "error", err)
}
if sbID, err := id.ParseSandboxID(sandboxIDStr); err == nil {
if _, err := s.DB.UpdateSandboxStatus(ctx, db.UpdateSandboxStatusParams{
ID: sbID, Status: "stopped",
}); err != nil {
slog.Warn("failed to mark builder sandbox stopped", "sandbox_id", sandboxIDStr, "error", err)
}
}
}
// fetchSandboxEnv executes the 'env' command inside the specified sandbox via

View File

@ -121,7 +121,7 @@ type hostagentClient = interface {
// sandbox event to the Redis stream when the operation completes.
func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.Sandbox, error) {
if p.Template == "" {
p.Template = "minimal"
p.Template = "minimal-ubuntu"
}
if err := validate.SafeName(p.Template); err != nil {
return db.Sandbox{}, fmt.Errorf("invalid template name: %w", err)
@ -137,26 +137,23 @@ func (s *SandboxService) Create(ctx context.Context, p SandboxCreateParams) (db.
}
p.TimeoutSec = clampTimeout(p.TimeoutSec)
// Resolve template name → (teamID, templateID).
templateTeamID := id.PlatformTeamID
templateID := id.MinimalTemplateID
var templateDefaultUser string
// Resolve template name → (teamID, templateID). System base templates are
// platform-owned rows like any other, so the lookup handles them too (the
// query also matches platform templates for any team).
tmpl, err := s.DB.GetTemplateByTeam(ctx, db.GetTemplateByTeamParams{Name: p.Template, TeamID: p.TeamID})
if err != nil {
return db.Sandbox{}, fmt.Errorf("template %q not found: %w", p.Template, err)
}
templateTeamID := tmpl.TeamID
templateID := tmpl.ID
templateDefaultUser := tmpl.DefaultUser
var templateDefaultEnv map[string]string
if p.Template != "minimal" {
tmpl, err := s.DB.GetTemplateByTeam(ctx, db.GetTemplateByTeamParams{Name: p.Template, TeamID: p.TeamID})
if err != nil {
return db.Sandbox{}, fmt.Errorf("template %q not found: %w", p.Template, err)
}
templateTeamID = tmpl.TeamID
templateID = tmpl.ID
templateDefaultUser = tmpl.DefaultUser
if len(tmpl.DefaultEnv) > 0 {
_ = json.Unmarshal(tmpl.DefaultEnv, &templateDefaultEnv)
}
if tmpl.Type == "snapshot" {
p.VCPUs = tmpl.Vcpus
p.MemoryMB = tmpl.MemoryMb
}
if len(tmpl.DefaultEnv) > 0 {
_ = json.Unmarshal(tmpl.DefaultEnv, &templateDefaultEnv)
}
if tmpl.Type == "snapshot" {
p.VCPUs = tmpl.Vcpus
p.MemoryMB = tmpl.MemoryMb
}
if !p.TeamID.Valid {
@ -461,59 +458,140 @@ func (s *SandboxService) resumeInBackground(
})
}
// CreateSnapshot takes a live snapshot of a running sandbox, publishing
// the result as a new template owned by the sandbox's team. Returns the
// inserted template record.
func (s *SandboxService) CreateSnapshot(ctx context.Context, sandboxID, teamID pgtype.UUID, name string) (db.Template, error) {
// CreateSnapshot asynchronously snapshots a running or paused sandbox,
// publishing the result as a new template owned by the sandbox's team. The DB
// CAS from the sandbox's current status to "snapshotting" is the authoritative
// gate against concurrent Pause/Snapshot/Destroy calls; if it loses, no agent
// RPC fires. A running sandbox is snapshotted live (CH briefly paused, then
// resumed); a paused sandbox is snapshotted from its on-disk artefacts without
// reviving the VM. Either way the sandbox returns to its original status on
// completion. Returns the sandbox (now "snapshotting") and the resolved name.
func (s *SandboxService) CreateSnapshot(ctx context.Context, sandboxID, teamID pgtype.UUID, name string) (db.Sandbox, string, error) {
sb, err := s.DB.GetSandboxByTeam(ctx, db.GetSandboxByTeamParams{ID: sandboxID, TeamID: teamID})
if err != nil {
return db.Template{}, fmt.Errorf("sandbox not found: %w", err)
return db.Sandbox{}, "", fmt.Errorf("sandbox not found: %w", err)
}
if sb.Status != "running" {
return db.Template{}, fmt.Errorf("sandbox is not running (status: %s)", sb.Status)
if sb.Status != "running" && sb.Status != "paused" {
return db.Sandbox{}, "", fmt.Errorf("sandbox is not running or paused (status: %s)", sb.Status)
}
origStatus := sb.Status
if name == "" {
name = id.NewSnapshotName()
}
if err := validate.SafeName(name); err != nil {
return db.Template{}, fmt.Errorf("invalid name: %w", err)
return db.Sandbox{}, "", fmt.Errorf("invalid name: %w", err)
}
// Reject duplicate names up front so we don't pause the VM and dump memory
// only to fail on the template insert at the very end.
if _, err := s.DB.GetTemplateByTeam(ctx, db.GetTemplateByTeamParams{Name: name, TeamID: teamID}); err == nil {
return db.Sandbox{}, "", fmt.Errorf("conflict: a snapshot named %q already exists", name)
}
if _, err := s.DB.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: origStatus, Status_2: "snapshotting",
}); err != nil {
return db.Sandbox{}, "", fmt.Errorf("sandbox not in %s state (current: %s)", origStatus, sb.Status)
}
agent, err := s.agentForHost(ctx, sb.HostID)
if err != nil {
return db.Template{}, err
// Roll back the CAS so the sandbox isn't stuck in "snapshotting".
if _, rerr := s.DB.UpdateSandboxStatusIf(ctx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: "snapshotting", Status_2: origStatus,
}); rerr != nil {
slog.Warn("failed to roll back snapshotting→"+origStatus, "id", id.FormatSandboxID(sandboxID), "error", rerr)
}
return db.Sandbox{}, "", err
}
sandboxIDStr := id.FormatSandboxID(sandboxID)
hostIDStr := id.FormatHostID(sb.HostID)
teamIDStr := id.FormatTeamID(sb.TeamID)
// Notify other clients that the badge moved to "snapshotting".
s.publishStateChanged(ctx, sandboxIDStr, teamIDStr, hostIDStr, origStatus, "snapshotting")
go s.snapshotInBackground(sandboxID, sandboxIDStr, hostIDStr, teamIDStr, teamID, agent, name, origStatus, sb.Vcpus, sb.MemoryMb)
sb.Status = "snapshotting"
return sb, name, nil
}
func (s *SandboxService) snapshotInBackground(
sandboxID pgtype.UUID, sandboxIDStr, hostIDStr, teamIDStr string, teamID pgtype.UUID,
agent hostagentClient, name, origStatus string, vcpus, memoryMB int32,
) {
bgCtx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()
newTemplateID := id.NewSandboxID() // any random UUID
templateUUID := pgtype.UUID{Bytes: newTemplateID.Bytes, Valid: true}
resp, err := agent.CreateSnapshot(ctx, connect.NewRequest(&pb.CreateSnapshotRequest{
SandboxId: id.FormatSandboxID(sandboxID),
resp, err := agent.CreateSnapshot(bgCtx, connect.NewRequest(&pb.CreateSnapshotRequest{
SandboxId: sandboxIDStr,
Name: name,
TeamId: id.UUIDString(teamID),
TemplateId: id.UUIDString(templateUUID),
}))
if err != nil {
return db.Template{}, fmt.Errorf("agent snapshot: %w", err)
// Either way, the host-side op is done; return the badge to its original
// status (running for a live snapshot, paused for an on-disk one). Use a CAS
// so a concurrent Destroy (which sets "stopping") wins: if the CAS misses,
// the sandbox is no longer ours and we must NOT announce its old status. The
// snapshot itself is still valid and is registered below — a snapshot
// template outlives its source sandbox.
if _, derr := s.DB.UpdateSandboxStatusIf(bgCtx, db.UpdateSandboxStatusIfParams{
ID: sandboxID, Status: "snapshotting", Status_2: origStatus,
}); derr != nil {
slog.Warn("snapshotting→"+origStatus+" CAS missed (sandbox moved on); skipping state signal", "sandbox_id", sandboxIDStr, "error", derr)
} else {
s.publishStateChanged(bgCtx, sandboxIDStr, teamIDStr, hostIDStr, "snapshotting", origStatus)
}
tmpl, err := s.DB.InsertTemplate(ctx, db.InsertTemplateParams{
if err != nil {
slog.Warn("background snapshot failed", "sandbox_id", sandboxIDStr, "error", err)
s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.snapshot_failed", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
Metadata: map[string]string{"name": name}, Error: err.Error(), Timestamp: time.Now().Unix(),
})
return
}
if _, err := s.DB.InsertTemplate(bgCtx, db.InsertTemplateParams{
ID: templateUUID,
Name: name,
Type: "snapshot",
Vcpus: sb.Vcpus,
MemoryMb: sb.MemoryMb,
Vcpus: vcpus,
MemoryMb: memoryMB,
SizeBytes: resp.Msg.SizeBytes,
TeamID: teamID,
DefaultUser: "",
DefaultEnv: []byte("{}"),
Metadata: []byte("{}"),
})
if err != nil {
return db.Template{}, fmt.Errorf("insert template: %w", err)
}); err != nil {
slog.Warn("failed to insert snapshot template", "sandbox_id", sandboxIDStr, "name", name, "error", err)
s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.snapshot_failed", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
Metadata: map[string]string{"name": name}, Error: "failed to register snapshot", Timestamp: time.Now().Unix(),
})
return
}
return tmpl, nil
s.publishEvent(bgCtx, SandboxStateEvent{
Event: "sandbox.snapshotted", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
Metadata: map[string]string{"name": name}, Timestamp: time.Now().Unix(),
})
}
// publishStateChanged emits a transient capsule.state.changed event so the
// dashboard flips the status badge during a transition that has no terminal
// lifecycle verb of its own (e.g. the snapshotting round-trip).
func (s *SandboxService) publishStateChanged(ctx context.Context, sandboxIDStr, teamIDStr, hostIDStr, from, to string) {
s.publishEvent(ctx, SandboxStateEvent{
Event: "sandbox.state_changed", SandboxID: sandboxIDStr, TeamID: teamIDStr, HostID: hostIDStr,
Metadata: map[string]string{"from": from, "to": to}, Timestamp: time.Now().Unix(),
})
}
// Destroy stops a sandbox asynchronously. Pre-marks the DB status as

View File

@ -1,393 +0,0 @@
#!/usr/bin/env bash
#
# prepare-wrenn-user.sh — Create the wrenn system user and configure minimal privileges.
#
# Creates a locked-down 'wrenn' system user that can run wrenn-agent and wrenn-cp
# with only the privileges they need. The agent binary gets Linux capabilities
# via setcap — no sudo is configured for the wrenn user at all. If an attacker
# compromises the wrenn user, they cannot escalate via sudo.
#
# What this script does:
# 1. Creates the 'wrenn' system user (bash shell for debugging, no home dir)
# 2. Creates required directories with correct ownership
# 3. Sets Linux capabilities on wrenn-agent and all child binaries
# 4. Installs an apt hook to restore capabilities after package updates
# 5. Installs a sudoers drop-in (comment-only, no grants — absence is the cage)
# 6. Ensures required kernel modules are loaded
# 7. Writes systemd unit files for both wrenn-agent and wrenn-cp
#
# Usage:
# sudo bash scripts/prepare-wrenn-user.sh
#
# Prerequisites:
# - wrenn-agent binary at /usr/local/bin/wrenn-agent
# - wrenn-cp binary at /usr/local/bin/wrenn-cp
# - cloud-hypervisor binary at /usr/local/bin/cloud-hypervisor
# - libcap2-bin installed (for setcap)
set -euo pipefail
# ── Guard ────────────────────────────────────────────────────────────────────
if [[ $EUID -ne 0 ]]; then
echo "ERROR: This script must be run as root."
exit 1
fi
# ── Configuration ────────────────────────────────────────────────────────────
WRENN_USER="wrenn"
WRENN_GROUP="wrenn"
WRENN_DIR="/var/lib/wrenn"
AGENT_BIN="/usr/local/bin/wrenn-agent"
CP_BIN="/usr/local/bin/wrenn-cp"
CH_BIN="/usr/local/bin/cloud-hypervisor"
RESTORE_CAPS_SCRIPT="/etc/wrenn/restore-caps.sh"
# ── 1. Create system user ───────────────────────────────────────────────────
if id "${WRENN_USER}" &>/dev/null; then
echo "==> User '${WRENN_USER}' already exists, skipping creation."
else
echo "==> Creating system user '${WRENN_USER}'..."
useradd \
--system \
--no-create-home \
--home-dir "${WRENN_DIR}" \
--shell /bin/bash \
"${WRENN_USER}"
fi
# Add wrenn to kvm group for /dev/kvm access.
if getent group kvm &>/dev/null; then
usermod -aG kvm "${WRENN_USER}"
echo "==> Added '${WRENN_USER}' to 'kvm' group."
fi
# ── 2. Create directories with correct ownership ────────────────────────────
echo "==> Setting up directories..."
directories=(
"${WRENN_DIR}"
"${WRENN_DIR}/images"
"${WRENN_DIR}/kernels"
"${WRENN_DIR}/sandboxes"
"${WRENN_DIR}/snapshots"
"${WRENN_DIR}/logs"
"/run/netns"
)
for dir in "${directories[@]}"; do
mkdir -p "${dir}"
done
# Only chown wrenn-owned dirs (not /run/netns which is system-managed).
for dir in "${WRENN_DIR}" "${WRENN_DIR}/images" "${WRENN_DIR}/kernels" \
"${WRENN_DIR}/sandboxes" "${WRENN_DIR}/snapshots" "${WRENN_DIR}/logs"; do
chown "${WRENN_USER}:${WRENN_GROUP}" "${dir}"
chmod 750 "${dir}"
done
# ── 3. Set capabilities on binaries ─────────────────────────────────────────
#
# These capabilities replace full root access. The wrenn-agent binary gets
# exactly the capabilities it needs for:
#
# CAP_SYS_ADMIN — network namespaces (netns create/enter), mount namespaces
# (unshare -m), losetup, dmsetup, mount/umount
# CAP_NET_ADMIN — veth/TAP creation (netlink), iptables rules, IP forwarding,
# routing table manipulation
# CAP_NET_RAW — raw socket access (needed by iptables internally)
# CAP_SYS_PTRACE — reading /proc/self/ns/net (netns.Get)
# CAP_KILL — sending SIGTERM/SIGKILL to Cloud Hypervisor processes
# CAP_DAC_OVERRIDE — accessing /dev/loop*, /dev/mapper/*, /dev/net/tun,
# /proc/sys/net/ipv4/ip_forward
# CAP_MKNOD — creating device nodes (dm-snapshot)
#
# The 'ep' suffix means Effective + Permitted (granted at exec time).
echo "==> Setting capabilities on wrenn-agent..."
if [[ ! -f "${AGENT_BIN}" ]]; then
echo "WARNING: ${AGENT_BIN} not found, skipping setcap. Install the binary first."
else
setcap \
cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
"${AGENT_BIN}"
echo " Capabilities set on ${AGENT_BIN}:"
getcap "${AGENT_BIN}"
fi
# Cloud Hypervisor also needs capabilities when spawned by a non-root parent.
# CAP_NET_ADMIN is required for network device access inside the netns.
if [[ -f "${CH_BIN}" ]]; then
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep "${CH_BIN}"
echo " Capabilities set on ${CH_BIN}:"
getcap "${CH_BIN}"
fi
# ── Helper: resolve binary path and apply setcap ────────────────────────────
#
# Uses `command -v` to find the binary in PATH (handles /usr/bin vs /usr/sbin
# differences across distros), then `readlink -f` to resolve symlinks so that
# setcap hits the real inode (important for iptables-nft/alternatives).
setcap_binary() {
local name="$1" caps="$2"
local bin
bin=$(command -v "$name" 2>/dev/null) || {
echo " WARNING: ${name} not found in PATH, skipping."
return 0
}
bin=$(readlink -f "$bin")
setcap "$caps" "$bin"
echo " $(getcap "$bin")"
}
# The child binaries invoked by wrenn-agent (iptables, losetup, dmsetup, etc.)
# also need capabilities since they'll be exec'd by a non-root user.
echo "==> Setting capabilities on child binaries..."
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
setcap_binary sysctl "cap_net_admin+ep"
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
setcap_binary dd "cap_dac_override+ep"
setcap_binary unshare "cap_sys_admin+ep"
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
# ── 4. Persist capabilities across package updates ──────────────────────────
#
# apt/dpkg overwrites binaries on package updates, which strips the xattr-based
# capabilities set by setcap. This installs:
# - /etc/wrenn/restore-caps.sh: re-applies setcap to all child binaries
# - /etc/apt/apt.conf.d/99-wrenn-setcap: apt post-invoke hook that calls it
echo "==> Installing capability restore hook..."
mkdir -p /etc/wrenn
cat > "${RESTORE_CAPS_SCRIPT}" << 'RESTORE'
#!/usr/bin/env bash
#
# restore-caps.sh — Re-apply Linux capabilities to wrenn child binaries.
# Called automatically by apt after package updates (see /etc/apt/apt.conf.d/99-wrenn-setcap).
# Can also be run manually: sudo /etc/wrenn/restore-caps.sh
set -euo pipefail
setcap_binary() {
local name="$1" caps="$2"
local bin
bin=$(command -v "$name" 2>/dev/null) || return 0
bin=$(readlink -f "$bin")
setcap "$caps" "$bin" 2>/dev/null || true
}
# wrenn-agent and cloud-hypervisor (only if present — they aren't package-managed).
[[ -f /usr/local/bin/wrenn-agent ]] && \
setcap cap_sys_admin,cap_net_admin,cap_net_raw,cap_sys_ptrace,cap_kill,cap_dac_override,cap_mknod+ep \
/usr/local/bin/wrenn-agent 2>/dev/null || true
[[ -f /usr/local/bin/cloud-hypervisor ]] && \
setcap cap_net_admin,cap_sys_admin,cap_dac_override+ep \
/usr/local/bin/cloud-hypervisor 2>/dev/null || true
# Child binaries (these are the ones wiped by apt).
setcap_binary iptables "cap_net_admin,cap_net_raw+ep"
setcap_binary iptables-save "cap_net_admin,cap_net_raw+ep"
setcap_binary ip "cap_sys_admin,cap_net_admin+ep"
setcap_binary sysctl "cap_net_admin+ep"
setcap_binary losetup "cap_sys_admin,cap_dac_override+ep"
setcap_binary blockdev "cap_sys_admin,cap_dac_override+ep"
setcap_binary dmsetup "cap_sys_admin,cap_dac_override,cap_mknod+ep"
setcap_binary e2fsck "cap_sys_admin,cap_dac_override+ep"
setcap_binary resize2fs "cap_sys_admin,cap_dac_override+ep"
setcap_binary dd "cap_dac_override+ep"
setcap_binary unshare "cap_sys_admin+ep"
setcap_binary mount "cap_sys_admin,cap_dac_override+ep"
RESTORE
chmod 755 "${RESTORE_CAPS_SCRIPT}"
cat > /etc/apt/apt.conf.d/99-wrenn-setcap << 'APT'
// Re-apply Linux capabilities to wrenn child binaries after any package update.
// Capabilities (xattr) are stripped when dpkg overwrites a binary.
DPkg::Post-Invoke { "/etc/wrenn/restore-caps.sh"; };
APT
echo " Installed ${RESTORE_CAPS_SCRIPT} and apt post-invoke hook."
# ── 5. Device access ────────────────────────────────────────────────────────
#
# /dev/kvm — handled by kvm group membership above
# /dev/net/tun — needs to be accessible by wrenn user
echo "==> Configuring device access..."
# Ensure /dev/net/tun is accessible (udev rule for persistence across reboots).
cat > /etc/udev/rules.d/99-wrenn.rules << 'UDEV'
# Allow wrenn user access to TUN device for TAP networking.
SUBSYSTEM=="misc", KERNEL=="tun", GROUP="wrenn", MODE="0660"
UDEV
udevadm control --reload-rules 2>/dev/null || true
echo " Installed udev rule for /dev/net/tun."
# ── 6. Kernel modules ───────────────────────────────────────────────────────
echo "==> Ensuring kernel modules are loaded..."
modules=(dm_snapshot dm_mod loop tun)
for mod in "${modules[@]}"; do
if ! lsmod | grep -q "^${mod}"; then
modprobe "${mod}" 2>/dev/null && echo " Loaded ${mod}" || echo " WARNING: Could not load ${mod}"
else
echo " ${mod} already loaded."
fi
done
# Persist across reboots.
for mod in "${modules[@]}"; do
grep -qxF "${mod}" /etc/modules-load.d/wrenn.conf 2>/dev/null || echo "${mod}" >> /etc/modules-load.d/wrenn.conf
done
echo " Module persistence written to /etc/modules-load.d/wrenn.conf."
# ── 7. Sudoers ──────────────────────────────────────────────────────────────
#
# The wrenn user has no sudo grants. The absence of a grant is the cage — an
# explicit "!ALL" deny is weaker due to known bypasses (CVE-2019-14287).
# This file exists purely as documentation for operators running `sudo -l`.
echo "==> Writing sudoers drop-in..."
cat > /etc/sudoers.d/wrenn << 'SUDOERS'
# Wrenn system user — no sudo access permitted.
# All privilege is granted via Linux capabilities on specific binaries (setcap).
# This file contains no active rules. The absence of any grant is intentional
# and is the strongest way to deny escalation.
#
# Do not add rules here. If the wrenn user needs new privileges, use setcap
# on the specific binary instead.
SUDOERS
chmod 440 /etc/sudoers.d/wrenn
visudo -c -f /etc/sudoers.d/wrenn
echo " /etc/sudoers.d/wrenn installed and validated."
# ── 8. Systemd units ────────────────────────────────────────────────────────
echo "==> Writing systemd service files..."
cat > /etc/systemd/system/wrenn-agent.service << 'UNIT'
[Unit]
Description=Wrenn Host Agent
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=wrenn
Group=wrenn
EnvironmentFile=-/etc/wrenn/agent.env
# The binary has capabilities set via setcap. These systemd directives ensure
# the capabilities are inherited into the process at exec time.
AmbientCapabilities=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_NET_ADMIN CAP_NET_RAW CAP_SYS_PTRACE CAP_KILL CAP_DAC_OVERRIDE CAP_MKNOD
# IMPORTANT: must be false — child binaries (iptables, losetup, dmsetup, etc.)
# have their own file capabilities via setcap which must be honored at exec time.
NoNewPrivileges=false
# Enable IP forwarding before the agent starts. The "+" prefix runs this
# directive as root (bypassing User=wrenn) so it can write to procfs.
ExecStartPre=+/bin/sh -c 'sysctl -w net.ipv4.ip_forward=1'
ExecStart=/usr/local/bin/wrenn-agent --address ${WRENN_ADVERTISE_ADDR}
Restart=on-failure
RestartSec=5
# File descriptor limits (Cloud Hypervisor + loop devices + sockets).
LimitNOFILE=65536
LimitNPROC=4096
# IO priority + cgroup weight. Large-VM snapshot writes (CH memfile dump,
# zero-page hole punching, dm-snapshot flatten) can saturate a single-disk
# host and starve sshd/journal reads. Best-effort scheduling class +
# below-default cgroup weight lets latency-sensitive workloads keep up.
IOSchedulingClass=best-effort
IOSchedulingPriority=5
IOWeight=50
# Protect host filesystem — only allow access to what's needed.
ProtectHome=true
ReadWritePaths=/var/lib/wrenn /tmp /run/netns /dev/mapper
ReadOnlyPaths=/usr/local/bin/cloud-hypervisor
[Install]
WantedBy=multi-user.target
UNIT
cat > /etc/systemd/system/wrenn-cp.service << 'UNIT'
[Unit]
Description=Wrenn Control Plane
After=network-online.target postgresql.service
Wants=network-online.target
[Service]
Type=simple
User=wrenn
Group=wrenn
EnvironmentFile=-/etc/wrenn/cp.env
# Control plane is fully unprivileged — no capabilities needed.
NoNewPrivileges=true
CapabilityBoundingSet=
ExecStart=/usr/local/bin/wrenn-cp
Restart=on-failure
RestartSec=5
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/tmp
[Install]
WantedBy=multi-user.target
UNIT
mkdir -p /etc/wrenn
touch /etc/wrenn/agent.env /etc/wrenn/cp.env
chmod 640 /etc/wrenn/agent.env /etc/wrenn/cp.env
chown root:${WRENN_GROUP} /etc/wrenn/agent.env /etc/wrenn/cp.env
systemctl daemon-reload
echo " wrenn-agent.service and wrenn-cp.service installed."
# ── Done ─────────────────────────────────────────────────────────────────────
echo ""
echo "=== Setup complete ==="
echo ""
echo "Next steps:"
echo " 1. Copy wrenn-agent and wrenn-cp binaries to /usr/local/bin/"
echo " 2. Edit /etc/wrenn/agent.env with WRENN_CP_URL and WRENN_ADVERTISE_ADDR"
echo " 3. Edit /etc/wrenn/cp.env with DATABASE_URL and other control plane config"
echo " 4. systemctl enable --now wrenn-agent"
echo " 5. systemctl enable --now wrenn-cp"
echo ""
echo "Security summary:"
echo " - wrenn user: bash shell (for debugging), no home, no sudo (no grants in sudoers)"
echo " - wrenn-agent: runs as wrenn with 7 capabilities via setcap (not root)"
echo " - wrenn-cp: runs as wrenn with zero capabilities"
echo " - Capabilities auto-restored after apt upgrades via /etc/wrenn/restore-caps.sh"
echo ""

View File

@ -38,7 +38,9 @@ IMAGE_NAME="$2"
OUTPUT_DIR="${WRENN_IMAGES_PATH}/${IMAGE_NAME}"
OUTPUT_FILE="${OUTPUT_DIR}/rootfs.ext4"
MOUNT_DIR="/tmp/wrenn-rootfs-build"
TAR_FILE="/tmp/wrenn-rootfs-export-${IMAGE_NAME}.tar"
# IMAGE_NAME may contain slashes (e.g. teams/<team>/<id>); flatten them so the
# temp tar is a single file in /tmp rather than a path into a missing dir.
TAR_FILE="/tmp/wrenn-rootfs-export-${IMAGE_NAME//\//_}.tar"
# Verify the container exists.
if ! docker inspect "${CONTAINER}" > /dev/null 2>&1; then
@ -121,16 +123,24 @@ if [ -z "${TINI_BIN}" ]; then
aarch64) TINI_ARCH="arm64" ;;
*) echo "ERROR: Unsupported architecture: ${ARCH}"; exit 1 ;;
esac
# Use the statically linked tini so the binary runs regardless of the
# guest's libc (glibc on Ubuntu/Arch/Fedora, musl on Alpine).
TINI_VERSION="v0.19.0"
TINI_URL="https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"
TINI_TMP="/tmp/tini-${TINI_ARCH}"
echo " Downloading tini ${TINI_VERSION} (${TINI_ARCH})..."
TINI_URL="https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static-${TINI_ARCH}"
TINI_TMP="/tmp/tini-static-${TINI_ARCH}"
echo " Downloading tini ${TINI_VERSION} static (${TINI_ARCH})..."
curl -fsSL "${TINI_URL}" -o "${TINI_TMP}"
chmod +x "${TINI_TMP}"
TINI_BIN="${TINI_TMP}"
fi
sudo mkdir -p "${MOUNT_DIR}/sbin"
sudo cp "${TINI_BIN}" "${MOUNT_DIR}/sbin/tini"
# On usr-merged distros (e.g. Fedora) /sbin is a symlink to /usr/bin, so a tini
# already at /usr/bin/tini IS /sbin/tini — copying onto itself errors. Skip then.
if [ "${TINI_BIN}" -ef "${MOUNT_DIR}/sbin/tini" ]; then
echo " tini already at /sbin/tini (usr-merged); skipping copy"
else
sudo cp "${TINI_BIN}" "${MOUNT_DIR}/sbin/tini"
fi
sudo chmod 755 "${MOUNT_DIR}/sbin/tini"
# Step 6: Verify injected binaries and required container packages.

View File

@ -1,32 +1,46 @@
#!/usr/bin/env bash
#
# update-debug-rootfs.sh — Build envd and inject it (plus wrenn-init + tini) into the debug rootfs.
# update-minimal-rootfs.sh — Rebuild envd and inject it (plus wrenn-init + tini)
# into the system base rootfs images.
#
# This script:
# 1. Builds a fresh envd static binary via make
# 2. Mounts the rootfs image
# 3. Copies envd, wrenn-init, and tini into the image
# 4. Unmounts cleanly
# 1. Builds a fresh envd static binary via make (once)
# 2. For each system base rootfs (ubuntu/alpine/arch/fedora): mounts it,
# copies envd + wrenn-init + tini in, and unmounts cleanly
#
# Usage:
# bash scripts/update-debug-rootfs.sh [rootfs_path]
# bash scripts/update-minimal-rootfs.sh [rootfs_path]
#
# Defaults to /var/lib/wrenn/images/minimal/rootfs.ext4
# With no argument it updates all four system base rootfs images under
# ${WRENN_DIR}/images/teams/<platform>/<id>/rootfs.ext4
# With a path argument it updates only that single rootfs.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
WRENN_DIR="${WRENN_DIR:-/var/lib/wrenn}"
ROOTFS="${1:-${WRENN_DIR}/images/minimal/rootfs.ext4}"
MOUNT_DIR="/tmp/wrenn-rootfs-update"
if [ ! -f "${ROOTFS}" ]; then
echo "ERROR: Rootfs not found at ${ROOTFS}"
exit 1
# base36(all-zeros UUID) = platform team that owns every system base template.
PLATFORM_TEAM_B36="0000000000000000000000000"
# System base template IDs (well-known reserved IDs 0..3). Single-digit IDs, so
# the 25-char base36 string is just the zero-padded decimal.
SYSTEM_TEMPLATE_IDS=(0 1 2 3)
# Resolve which rootfs images to update.
ROOTFS_LIST=()
if [ $# -ge 1 ]; then
ROOTFS_LIST=("$1")
else
for tid in "${SYSTEM_TEMPLATE_IDS[@]}"; do
tmpl_b36="$(printf '%025d' "${tid}")"
ROOTFS_LIST+=("${WRENN_DIR}/images/teams/${PLATFORM_TEAM_B36}/${tmpl_b36}/rootfs.ext4")
done
fi
# Step 1: Build envd.
# Step 1: Build envd (once).
echo "==> Building envd..."
cd "${PROJECT_ROOT}"
make build-envd
@ -42,64 +56,84 @@ if ! ldd "${ENVD_BIN}" | grep -q "statically linked"; then
exit 1
fi
# Step 2: Mount the rootfs.
echo "==> Mounting rootfs at ${MOUNT_DIR}..."
mkdir -p "${MOUNT_DIR}"
sudo mount -o loop,rw "${ROOTFS}" "${MOUNT_DIR}"
cleanup() {
echo "==> Unmounting rootfs..."
sudo umount "${MOUNT_DIR}" 2>/dev/null || true
rmdir "${MOUNT_DIR}" 2>/dev/null || true
}
trap cleanup EXIT
# Step 3: Copy files into rootfs.
echo "==> Installing envd..."
sudo mkdir -p "${MOUNT_DIR}/usr/local/bin"
sudo cp "${ENVD_BIN}" "${MOUNT_DIR}/usr/local/bin/envd"
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/envd"
echo "==> Installing wrenn-init..."
sudo cp "${PROJECT_ROOT}/images/wrenn-init.sh" "${MOUNT_DIR}/usr/local/bin/wrenn-init"
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/wrenn-init"
echo "==> Installing tini..."
TINI_BIN=""
# 1. Already in the rootfs?
for p in "${MOUNT_DIR}/usr/bin/tini" "${MOUNT_DIR}/sbin/tini" "${MOUNT_DIR}/usr/local/bin/tini"; do
if [ -f "$p" ]; then TINI_BIN="$p"; break; fi
done
# 2. Available on the host?
if [ -z "${TINI_BIN}" ]; then
for p in /usr/bin/tini /usr/local/bin/tini /sbin/tini; do
if [ -f "$p" ]; then TINI_BIN="$p"; break; fi
# resolve_tini ROOTFS_MOUNT — echo a path to a tini binary suitable for the
# mounted rootfs. Prefers one already in the image, then a static download.
resolve_tini() {
local mount_dir="$1" p tini_arch arch
for p in "${mount_dir}/usr/bin/tini" "${mount_dir}/sbin/tini" "${mount_dir}/usr/local/bin/tini"; do
if [ -f "$p" ]; then echo "$p"; return; fi
done
fi
# 3. Download from GitHub releases.
if [ -z "${TINI_BIN}" ]; then
ARCH="$(uname -m)"
case "${ARCH}" in
x86_64) TINI_ARCH="amd64" ;;
aarch64) TINI_ARCH="arm64" ;;
*) echo "ERROR: Unsupported architecture: ${ARCH}"; exit 1 ;;
arch="$(uname -m)"
case "${arch}" in
x86_64) tini_arch="amd64" ;;
aarch64) tini_arch="arm64" ;;
*) echo "ERROR: Unsupported architecture: ${arch}" >&2; exit 1 ;;
esac
TINI_VERSION="v0.19.0"
TINI_URL="https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"
TINI_TMP="/tmp/tini-${TINI_ARCH}"
echo " Downloading tini ${TINI_VERSION} (${TINI_ARCH})..."
curl -fsSL "${TINI_URL}" -o "${TINI_TMP}"
chmod +x "${TINI_TMP}"
TINI_BIN="${TINI_TMP}"
# Static tini runs under any libc (glibc or musl).
local tmp="/tmp/tini-static-${tini_arch}"
if [ ! -f "${tmp}" ]; then
echo " Downloading tini v0.19.0 static (${tini_arch})..." >&2
curl -fsSL "https://github.com/krallin/tini/releases/download/v0.19.0/tini-static-${tini_arch}" -o "${tmp}"
chmod +x "${tmp}"
fi
echo "${tmp}"
}
# inject_rootfs ROOTFS — mount, copy guest binaries in, unmount.
inject_rootfs() {
local rootfs="$1" tini_bin
echo ""
echo "==> Updating ${rootfs}"
mkdir -p "${MOUNT_DIR}"
sudo mount -o loop,rw "${rootfs}" "${MOUNT_DIR}"
local mounted=1
cleanup_mount() {
if [ "${mounted}" = "1" ]; then
sudo umount "${MOUNT_DIR}" 2>/dev/null || true
rmdir "${MOUNT_DIR}" 2>/dev/null || true
mounted=0
fi
}
trap cleanup_mount RETURN
sudo mkdir -p "${MOUNT_DIR}/usr/local/bin"
sudo cp "${ENVD_BIN}" "${MOUNT_DIR}/usr/local/bin/envd"
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/envd"
sudo cp "${PROJECT_ROOT}/images/wrenn-init.sh" "${MOUNT_DIR}/usr/local/bin/wrenn-init"
sudo chmod 755 "${MOUNT_DIR}/usr/local/bin/wrenn-init"
tini_bin="$(resolve_tini "${MOUNT_DIR}")"
sudo mkdir -p "${MOUNT_DIR}/sbin"
# On usr-merged distros (e.g. Fedora) /sbin -> /usr/bin, so a tini already at
# /usr/bin/tini IS /sbin/tini — copying onto itself errors. Skip then.
if [ "${tini_bin}" -ef "${MOUNT_DIR}/sbin/tini" ]; then
echo " tini already at /sbin/tini (usr-merged); skipping copy"
else
sudo cp "${tini_bin}" "${MOUNT_DIR}/sbin/tini"
fi
sudo chmod 755 "${MOUNT_DIR}/sbin/tini"
ls -la "${MOUNT_DIR}/usr/local/bin/envd" "${MOUNT_DIR}/usr/local/bin/wrenn-init" "${MOUNT_DIR}/sbin/tini"
cleanup_mount
}
# Step 2: Update each rootfs that exists.
UPDATED=0
for rootfs in "${ROOTFS_LIST[@]}"; do
if [ ! -f "${rootfs}" ]; then
echo "==> Skipping (not found): ${rootfs}"
continue
fi
inject_rootfs "${rootfs}"
UPDATED=$((UPDATED + 1))
done
echo ""
if [ "${UPDATED}" -eq 0 ]; then
echo "==> No rootfs images updated. Build them first with: make images"
exit 1
fi
sudo mkdir -p "${MOUNT_DIR}/sbin"
sudo cp "${TINI_BIN}" "${MOUNT_DIR}/sbin/tini"
sudo chmod 755 "${MOUNT_DIR}/sbin/tini"
# Step 4: Verify.
echo ""
echo "==> Installed files:"
ls -la "${MOUNT_DIR}/usr/local/bin/envd" "${MOUNT_DIR}/usr/local/bin/wrenn-init" "${MOUNT_DIR}/sbin/tini"
echo ""
echo "==> Done. Rootfs updated: ${ROOTFS}"
echo "==> Done. Updated ${UPDATED} rootfs image(s)."